About this documentation ************************ Python’s documentation is generated from reStructuredText sources using Sphinx, a documentation generator originally created for Python and now maintained as an independent project. Development of the documentation and its toolchain is an entirely volunteer effort, just like Python itself. If you want to contribute, please take a look at the Dealing with Bugs page for information on how to do so. New volunteers are always welcome! Many thanks go to: * Fred L. Drake, Jr., the creator of the original Python documentation toolset and author of much of the content; * the Docutils project for creating reStructuredText and the Docutils suite; * Fredrik Lundh for his Alternative Python Reference project from which Sphinx got many good ideas. Contributors to the Python documentation ======================================== Many people have contributed to the Python language, the Python standard library, and the Python documentation. See Misc/ACKS in the Python source distribution for a partial list of contributors. It is only with the input and contributions of the Python community that Python has such wonderful documentation – Thank You! Dealing with Bugs ***************** Python is a mature programming language which has established a reputation for stability. In order to maintain this reputation, the developers would like to know of any deficiencies you find in Python. It can be sometimes faster to fix bugs yourself and contribute patches to Python as it streamlines the process and involves less people. Learn how to contribute. Documentation bugs ================== If you find a bug in this documentation or would like to propose an improvement, please submit a bug report on the tracker. If you have a suggestion on how to fix it, include that as well. You can also open a discussion item on our Documentation Discourse forum. If you find a bug in the theme (HTML / CSS / JavaScript) of the documentation, please submit a bug report on the python-doc-theme bug tracker. If you’re short on time, you can also email documentation bug reports to docs@python.org (behavioral bugs can be sent to python- list@python.org). ‘docs@’ is a mailing list run by volunteers; your request will be noticed, though it may take a while to be processed. See also: Documentation bugs A list of documentation bugs that have been submitted to the Python issue tracker. Issue Tracking Overview of the process involved in reporting an improvement on the tracker. Helping with Documentation Comprehensive guide for individuals that are interested in contributing to Python documentation. Documentation Translations A list of GitHub pages for documentation translation and their primary contacts. Using the Python issue tracker ============================== Issue reports for Python itself should be submitted via the GitHub issues tracker (https://github.com/python/cpython/issues). The GitHub issues tracker offers a web form which allows pertinent information to be entered and submitted to the developers. The first step in filing a report is to determine whether the problem has already been reported. The advantage in doing so, aside from saving the developers’ time, is that you learn what has been done to fix it; it may be that the problem has already been fixed for the next release, or additional information is needed (in which case you are welcome to provide it if you can!). To do this, search the tracker using the search box at the top of the page. If the problem you’re reporting is not already in the list, log in to GitHub. If you don’t already have a GitHub account, create a new account using the “Sign up” link. It is not possible to submit a bug report anonymously. Being now logged in, you can submit an issue. Click on the “New issue” button in the top bar to report a new issue. The submission form has two fields, “Title” and “Comment”. For the “Title” field, enter a *very* short description of the problem; fewer than ten words is good. In the “Comment” field, describe the problem in detail, including what you expected to happen and what did happen. Be sure to include whether any extension modules were involved, and what hardware and software platform you were using (including version information as appropriate). Each issue report will be reviewed by a developer who will determine what needs to be done to correct the problem. You will receive an update each time an action is taken on the issue. See also: How to Report Bugs Effectively Article which goes into some detail about how to create a useful bug report. This describes what kind of information is useful and why it is useful. Bug Writing Guidelines Information about writing a good bug report. Some of this is specific to the Mozilla project, but describes general good practices. Getting started contributing to Python yourself =============================================== Beyond just reporting bugs that you find, you are also welcome to submit patches to fix them. You can find more information on how to get started patching Python in the Python Developer’s Guide. If you have questions, the core-mentorship mailing list is a friendly place to get answers to any and all questions pertaining to the process of fixing issues in Python. Abstract Objects Layer ********************** The functions in this chapter interact with Python objects regardless of their type, or with wide classes of object types (e.g. all numerical types, or all sequence types). When used on object types for which they do not apply, they will raise a Python exception. It is not possible to use these functions on objects that are not properly initialized, such as a list object that has been created by "PyList_New()", but whose items have not been set to some non-"NULL" value yet. * Object Protocol * Call Protocol * The *tp_call* Protocol * The Vectorcall Protocol * Recursion Control * Vectorcall Support API * Object Calling API * Call Support API * Number Protocol * Sequence Protocol * Mapping Protocol * Iterator Protocol * Buffer Protocol * Buffer structure * Buffer request types * request-independent fields * readonly, format * shape, strides, suboffsets * contiguity requests * compound requests * Complex arrays * NumPy-style: shape and strides * PIL-style: shape, strides and suboffsets * Buffer-related functions Allocating Objects on the Heap ****************************** PyObject *_PyObject_New(PyTypeObject *type) *Return value: New reference.* PyVarObject *_PyObject_NewVar(PyTypeObject *type, Py_ssize_t size) *Return value: New reference.* PyObject *PyObject_Init(PyObject *op, PyTypeObject *type) *Return value: Borrowed reference.** Part of the Stable ABI.* Initialize a newly allocated object *op* with its type and initial reference. Returns the initialized object. Other fields of the object are not affected. PyVarObject *PyObject_InitVar(PyVarObject *op, PyTypeObject *type, Py_ssize_t size) *Return value: Borrowed reference.** Part of the Stable ABI.* This does everything "PyObject_Init()" does, and also initializes the length information for a variable-size object. PyObject_New(TYPE, typeobj) Allocate a new Python object using the C structure type *TYPE* and the Python type object *typeobj* ("PyTypeObject*"). Fields not defined by the Python object header are not initialized. The caller will own the only reference to the object (i.e. its reference count will be one). The size of the memory allocation is determined from the "tp_basicsize" field of the type object. Note that this function is unsuitable if *typeobj* has "Py_TPFLAGS_HAVE_GC" set. For such objects, use "PyObject_GC_New()" instead. PyObject_NewVar(TYPE, typeobj, size) Allocate a new Python object using the C structure type *TYPE* and the Python type object *typeobj* ("PyTypeObject*"). Fields not defined by the Python object header are not initialized. The allocated memory allows for the *TYPE* structure plus *size* ("Py_ssize_t") fields of the size given by the "tp_itemsize" field of *typeobj*. This is useful for implementing objects like tuples, which are able to determine their size at construction time. Embedding the array of fields into the same allocation decreases the number of allocations, improving the memory management efficiency. Note that this function is unsuitable if *typeobj* has "Py_TPFLAGS_HAVE_GC" set. For such objects, use "PyObject_GC_NewVar()" instead. void PyObject_Del(void *op) Releases memory allocated to an object using "PyObject_New" or "PyObject_NewVar". This is normally called from the "tp_dealloc" handler specified in the object’s type. The fields of the object should not be accessed after this call as the memory is no longer a valid Python object. PyObject _Py_NoneStruct Object which is visible in Python as "None". This should only be accessed using the "Py_None" macro, which evaluates to a pointer to this object. See also: Module Objects To allocate and create extension modules. API and ABI Versioning ********************** CPython exposes its version number in the following macros. Note that these correspond to the version code is **built** with, not necessarily the version used at **run time**. See C API Stability for a discussion of API and ABI stability across versions. PY_MAJOR_VERSION The "3" in "3.4.1a2". PY_MINOR_VERSION The "4" in "3.4.1a2". PY_MICRO_VERSION The "1" in "3.4.1a2". PY_RELEASE_LEVEL The "a" in "3.4.1a2". This can be "0xA" for alpha, "0xB" for beta, "0xC" for release candidate or "0xF" for final. PY_RELEASE_SERIAL The "2" in "3.4.1a2". Zero for final releases. PY_VERSION_HEX The Python version number encoded in a single integer. The underlying version information can be found by treating it as a 32 bit number in the following manner: +---------+---------------------------+---------------------------+----------------------------+ | Bytes | Bits (big endian order) | Meaning | Value for "3.4.1a2" | |=========|===========================|===========================|============================| | 1 | 1-8 | "PY_MAJOR_VERSION" | "0x03" | +---------+---------------------------+---------------------------+----------------------------+ | 2 | 9-16 | "PY_MINOR_VERSION" | "0x04" | +---------+---------------------------+---------------------------+----------------------------+ | 3 | 17-24 | "PY_MICRO_VERSION" | "0x01" | +---------+---------------------------+---------------------------+----------------------------+ | 4 | 25-28 | "PY_RELEASE_LEVEL" | "0xA" | | +---------------------------+---------------------------+----------------------------+ | | 29-32 | "PY_RELEASE_SERIAL" | "0x2" | +---------+---------------------------+---------------------------+----------------------------+ Thus "3.4.1a2" is hexversion "0x030401a2" and "3.10.0" is hexversion "0x030a00f0". Use this for numeric comparisons, e.g. "#if PY_VERSION_HEX >= ...". This version is also available via the symbol "Py_Version". const unsigned long Py_Version * Part of the Stable ABI since version 3.11.* The Python runtime version number encoded in a single constant integer, with the same format as the "PY_VERSION_HEX" macro. This contains the Python version used at run time. Added in version 3.11. All the given macros are defined in Include/patchlevel.h. Parsing arguments and building values ************************************* These functions are useful when creating your own extension functions and methods. Additional information and examples are available in Extending and Embedding the Python Interpreter. The first three of these functions described, "PyArg_ParseTuple()", "PyArg_ParseTupleAndKeywords()", and "PyArg_Parse()", all use *format strings* which are used to tell the function about the expected arguments. The format strings use the same syntax for each of these functions. Parsing arguments ================= A format string consists of zero or more “format units.” A format unit describes one Python object; it is usually a single character or a parenthesized sequence of format units. With a few exceptions, a format unit that is not a parenthesized sequence normally corresponds to a single address argument to these functions. In the following description, the quoted form is the format unit; the entry in (round) parentheses is the Python object type that matches the format unit; and the entry in [square] brackets is the type of the C variable(s) whose address should be passed. Strings and buffers ------------------- Note: On Python 3.12 and older, the macro "PY_SSIZE_T_CLEAN" must be defined before including "Python.h" to use all "#" variants of formats ("s#", "y#", etc.) explained below. This is not necessary on Python 3.13 and later. These formats allow accessing an object as a contiguous chunk of memory. You don’t have to provide raw storage for the returned unicode or bytes area. Unless otherwise stated, buffers are not NUL-terminated. There are three ways strings and buffers can be converted to C: * Formats such as "y*" and "s*" fill a "Py_buffer" structure. This locks the underlying buffer so that the caller can subsequently use the buffer even inside a "Py_BEGIN_ALLOW_THREADS" block without the risk of mutable data being resized or destroyed. As a result, **you have to call** "PyBuffer_Release()" after you have finished processing the data (or in any early abort case). * The "es", "es#", "et" and "et#" formats allocate the result buffer. **You have to call** "PyMem_Free()" after you have finished processing the data (or in any early abort case). * Other formats take a "str" or a read-only *bytes-like object*, such as "bytes", and provide a "const char *" pointer to its buffer. In this case the buffer is “borrowed”: it is managed by the corresponding Python object, and shares the lifetime of this object. You won’t have to release any memory yourself. To ensure that the underlying buffer may be safely borrowed, the object’s "PyBufferProcs.bf_releasebuffer" field must be "NULL". This disallows common mutable objects such as "bytearray", but also some read-only objects such as "memoryview" of "bytes". Besides this "bf_releasebuffer" requirement, there is no check to verify whether the input object is immutable (e.g. whether it would honor a request for a writable buffer, or whether another thread can mutate the data). "s" ("str") [const char *] Convert a Unicode object to a C pointer to a character string. A pointer to an existing string is stored in the character pointer variable whose address you pass. The C string is NUL-terminated. The Python string must not contain embedded null code points; if it does, a "ValueError" exception is raised. Unicode objects are converted to C strings using "'utf-8'" encoding. If this conversion fails, a "UnicodeError" is raised. Note: This format does not accept *bytes-like objects*. If you want to accept filesystem paths and convert them to C character strings, it is preferable to use the "O&" format with "PyUnicode_FSConverter()" as *converter*. Changed in version 3.5: Previously, "TypeError" was raised when embedded null code points were encountered in the Python string. "s*" ("str" or *bytes-like object*) [Py_buffer] This format accepts Unicode objects as well as bytes-like objects. It fills a "Py_buffer" structure provided by the caller. In this case the resulting C string may contain embedded NUL bytes. Unicode objects are converted to C strings using "'utf-8'" encoding. "s#" ("str", read-only *bytes-like object*) [const char *, "Py_ssize_t"] Like "s*", except that it provides a borrowed buffer. The result is stored into two C variables, the first one a pointer to a C string, the second one its length. The string may contain embedded null bytes. Unicode objects are converted to C strings using "'utf-8'" encoding. "z" ("str" or "None") [const char *] Like "s", but the Python object may also be "None", in which case the C pointer is set to "NULL". "z*" ("str", *bytes-like object* or "None") [Py_buffer] Like "s*", but the Python object may also be "None", in which case the "buf" member of the "Py_buffer" structure is set to "NULL". "z#" ("str", read-only *bytes-like object* or "None") [const char *, "Py_ssize_t"] Like "s#", but the Python object may also be "None", in which case the C pointer is set to "NULL". "y" (read-only *bytes-like object*) [const char *] This format converts a bytes-like object to a C pointer to a borrowed character string; it does not accept Unicode objects. The bytes buffer must not contain embedded null bytes; if it does, a "ValueError" exception is raised. Changed in version 3.5: Previously, "TypeError" was raised when embedded null bytes were encountered in the bytes buffer. "y*" (*bytes-like object*) [Py_buffer] This variant on "s*" doesn’t accept Unicode objects, only bytes- like objects. **This is the recommended way to accept binary data.** "y#" (read-only *bytes-like object*) [const char *, "Py_ssize_t"] This variant on "s#" doesn’t accept Unicode objects, only bytes- like objects. "S" ("bytes") [PyBytesObject *] Requires that the Python object is a "bytes" object, without attempting any conversion. Raises "TypeError" if the object is not a bytes object. The C variable may also be declared as PyObject*. "Y" ("bytearray") [PyByteArrayObject *] Requires that the Python object is a "bytearray" object, without attempting any conversion. Raises "TypeError" if the object is not a "bytearray" object. The C variable may also be declared as PyObject*. "U" ("str") [PyObject *] Requires that the Python object is a Unicode object, without attempting any conversion. Raises "TypeError" if the object is not a Unicode object. The C variable may also be declared as PyObject*. "w*" (read-write *bytes-like object*) [Py_buffer] This format accepts any object which implements the read-write buffer interface. It fills a "Py_buffer" structure provided by the caller. The buffer may contain embedded null bytes. The caller have to call "PyBuffer_Release()" when it is done with the buffer. "es" ("str") [const char *encoding, char **buffer] This variant on "s" is used for encoding Unicode into a character buffer. It only works for encoded data without embedded NUL bytes. This format requires two arguments. The first is only used as input, and must be a const char* which points to the name of an encoding as a NUL-terminated string, or "NULL", in which case "'utf-8'" encoding is used. An exception is raised if the named encoding is not known to Python. The second argument must be a char**; the value of the pointer it references will be set to a buffer with the contents of the argument text. The text will be encoded in the encoding specified by the first argument. "PyArg_ParseTuple()" will allocate a buffer of the needed size, copy the encoded data into this buffer and adjust **buffer* to reference the newly allocated storage. The caller is responsible for calling "PyMem_Free()" to free the allocated buffer after use. "et" ("str", "bytes" or "bytearray") [const char *encoding, char **buffer] Same as "es" except that byte string objects are passed through without recoding them. Instead, the implementation assumes that the byte string object uses the encoding passed in as parameter. "es#" ("str") [const char *encoding, char **buffer, "Py_ssize_t" *buffer_length] This variant on "s#" is used for encoding Unicode into a character buffer. Unlike the "es" format, this variant allows input data which contains NUL characters. It requires three arguments. The first is only used as input, and must be a const char* which points to the name of an encoding as a NUL-terminated string, or "NULL", in which case "'utf-8'" encoding is used. An exception is raised if the named encoding is not known to Python. The second argument must be a char**; the value of the pointer it references will be set to a buffer with the contents of the argument text. The text will be encoded in the encoding specified by the first argument. The third argument must be a pointer to an integer; the referenced integer will be set to the number of bytes in the output buffer. There are two modes of operation: If **buffer* points a "NULL" pointer, the function will allocate a buffer of the needed size, copy the encoded data into this buffer and set **buffer* to reference the newly allocated storage. The caller is responsible for calling "PyMem_Free()" to free the allocated buffer after usage. If **buffer* points to a non-"NULL" pointer (an already allocated buffer), "PyArg_ParseTuple()" will use this location as the buffer and interpret the initial value of **buffer_length* as the buffer size. It will then copy the encoded data into the buffer and NUL- terminate it. If the buffer is not large enough, a "ValueError" will be set. In both cases, **buffer_length* is set to the length of the encoded data without the trailing NUL byte. "et#" ("str", "bytes" or "bytearray") [const char *encoding, char **buffer, "Py_ssize_t" *buffer_length] Same as "es#" except that byte string objects are passed through without recoding them. Instead, the implementation assumes that the byte string object uses the encoding passed in as parameter. Changed in version 3.12: "u", "u#", "Z", and "Z#" are removed because they used a legacy "Py_UNICODE*" representation. Numbers ------- These formats allow representing Python numbers or single characters as C numbers. Formats that require "int", "float" or "complex" can also use the corresponding special methods "__index__()", "__float__()" or "__complex__()" to convert the Python object to the required type. For signed integer formats, "OverflowError" is raised if the value is out of range for the C type. For unsigned integer formats, no range checking is done — the most significant bits are silently truncated when the receiving field is too small to receive the value. "b" ("int") [unsigned char] Convert a nonnegative Python integer to an unsigned tiny integer, stored in a C unsigned char. "B" ("int") [unsigned char] Convert a Python integer to a tiny integer without overflow checking, stored in a C unsigned char. "h" ("int") [short int] Convert a Python integer to a C short int. "H" ("int") [unsigned short int] Convert a Python integer to a C unsigned short int, without overflow checking. "i" ("int") [int] Convert a Python integer to a plain C int. "I" ("int") [unsigned int] Convert a Python integer to a C unsigned int, without overflow checking. "l" ("int") [long int] Convert a Python integer to a C long int. "k" ("int") [unsigned long] Convert a Python integer to a C unsigned long without overflow checking. "L" ("int") [long long] Convert a Python integer to a C long long. "K" ("int") [unsigned long long] Convert a Python integer to a C unsigned long long without overflow checking. "n" ("int") ["Py_ssize_t"] Convert a Python integer to a C "Py_ssize_t". "c" ("bytes" or "bytearray" of length 1) [char] Convert a Python byte, represented as a "bytes" or "bytearray" object of length 1, to a C char. Changed in version 3.3: Allow "bytearray" objects. "C" ("str" of length 1) [int] Convert a Python character, represented as a "str" object of length 1, to a C int. "f" ("float") [float] Convert a Python floating-point number to a C float. "d" ("float") [double] Convert a Python floating-point number to a C double. "D" ("complex") [Py_complex] Convert a Python complex number to a C "Py_complex" structure. Other objects ------------- "O" (object) [PyObject *] Store a Python object (without any conversion) in a C object pointer. The C program thus receives the actual object that was passed. A new *strong reference* to the object is not created (i.e. its reference count is not increased). The pointer stored is not "NULL". "O!" (object) [*typeobject*, PyObject *] Store a Python object in a C object pointer. This is similar to "O", but takes two C arguments: the first is the address of a Python type object, the second is the address of the C variable (of type PyObject*) into which the object pointer is stored. If the Python object does not have the required type, "TypeError" is raised. "O&" (object) [*converter*, *address*] Convert a Python object to a C variable through a *converter* function. This takes two arguments: the first is a function, the second is the address of a C variable (of arbitrary type), converted to void*. The *converter* function in turn is called as follows: status = converter(object, address); where *object* is the Python object to be converted and *address* is the void* argument that was passed to the "PyArg_Parse*" function. The returned *status* should be "1" for a successful conversion and "0" if the conversion has failed. When the conversion fails, the *converter* function should raise an exception and leave the content of *address* unmodified. If the *converter* returns "Py_CLEANUP_SUPPORTED", it may get called a second time if the argument parsing eventually fails, giving the converter a chance to release any memory that it had already allocated. In this second call, the *object* parameter will be "NULL"; *address* will have the same value as in the original call. Examples of converters: "PyUnicode_FSConverter()" and "PyUnicode_FSDecoder()". Changed in version 3.1: "Py_CLEANUP_SUPPORTED" was added. "p" ("bool") [int] Tests the value passed in for truth (a boolean **p**redicate) and converts the result to its equivalent C true/false integer value. Sets the int to "1" if the expression was true and "0" if it was false. This accepts any valid Python value. See Truth Value Testing for more information about how Python tests values for truth. Added in version 3.3. "(items)" ("tuple") [*matching-items*] The object must be a Python sequence whose length is the number of format units in *items*. The C arguments must correspond to the individual format units in *items*. Format units for sequences may be nested. A few other characters have a meaning in a format string. These may not occur inside nested parentheses. They are: "|" Indicates that the remaining arguments in the Python argument list are optional. The C variables corresponding to optional arguments should be initialized to their default value — when an optional argument is not specified, "PyArg_ParseTuple()" does not touch the contents of the corresponding C variable(s). "$" "PyArg_ParseTupleAndKeywords()" only: Indicates that the remaining arguments in the Python argument list are keyword-only. Currently, all keyword-only arguments must also be optional arguments, so "|" must always be specified before "$" in the format string. Added in version 3.3. ":" The list of format units ends here; the string after the colon is used as the function name in error messages (the “associated value” of the exception that "PyArg_ParseTuple()" raises). ";" The list of format units ends here; the string after the semicolon is used as the error message *instead* of the default error message. ":" and ";" mutually exclude each other. Note that any Python object references which are provided to the caller are *borrowed* references; do not release them (i.e. do not decrement their reference count)! Additional arguments passed to these functions must be addresses of variables whose type is determined by the format string; these are used to store values from the input tuple. There are a few cases, as described in the list of format units above, where these parameters are used as input values; they should match what is specified for the corresponding format unit in that case. For the conversion to succeed, the *arg* object must match the format and the format must be exhausted. On success, the "PyArg_Parse*" functions return true, otherwise they return false and raise an appropriate exception. When the "PyArg_Parse*" functions fail due to conversion failure in one of the format units, the variables at the addresses corresponding to that and the following format units are left untouched. API Functions ------------- int PyArg_ParseTuple(PyObject *args, const char *format, ...) * Part of the Stable ABI.* Parse the parameters of a function that takes only positional parameters into local variables. Returns true on success; on failure, it returns false and raises the appropriate exception. int PyArg_VaParse(PyObject *args, const char *format, va_list vargs) * Part of the Stable ABI.* Identical to "PyArg_ParseTuple()", except that it accepts a va_list rather than a variable number of arguments. int PyArg_ParseTupleAndKeywords(PyObject *args, PyObject *kw, const char *format, char *const *keywords, ...) * Part of the Stable ABI.* Parse the parameters of a function that takes both positional and keyword parameters into local variables. The *keywords* argument is a "NULL"-terminated array of keyword parameter names specified as null-terminated ASCII or UTF-8 encoded C strings. Empty names denote positional-only parameters. Returns true on success; on failure, it returns false and raises the appropriate exception. Note: The *keywords* parameter declaration is char *const* in C and const char *const* in C++. This can be overridden with the "PY_CXX_CONST" macro. Changed in version 3.6: Added support for positional-only parameters. Changed in version 3.13: The *keywords* parameter has now type char *const* in C and const char *const* in C++, instead of char**. Added support for non-ASCII keyword parameter names. int PyArg_VaParseTupleAndKeywords(PyObject *args, PyObject *kw, const char *format, char *const *keywords, va_list vargs) * Part of the Stable ABI.* Identical to "PyArg_ParseTupleAndKeywords()", except that it accepts a va_list rather than a variable number of arguments. int PyArg_ValidateKeywordArguments(PyObject*) * Part of the Stable ABI.* Ensure that the keys in the keywords argument dictionary are strings. This is only needed if "PyArg_ParseTupleAndKeywords()" is not used, since the latter already does this check. Added in version 3.2. int PyArg_Parse(PyObject *args, const char *format, ...) * Part of the Stable ABI.* Parse the parameter of a function that takes a single positional parameter into a local variable. Returns true on success; on failure, it returns false and raises the appropriate exception. Example: // Function using METH_O calling convention static PyObject* my_function(PyObject *module, PyObject *arg) { int value; if (!PyArg_Parse(arg, "i:my_function", &value)) { return NULL; } // ... use value ... } int PyArg_UnpackTuple(PyObject *args, const char *name, Py_ssize_t min, Py_ssize_t max, ...) * Part of the Stable ABI.* A simpler form of parameter retrieval which does not use a format string to specify the types of the arguments. Functions which use this method to retrieve their parameters should be declared as "METH_VARARGS" in function or method tables. The tuple containing the actual parameters should be passed as *args*; it must actually be a tuple. The length of the tuple must be at least *min* and no more than *max*; *min* and *max* may be equal. Additional arguments must be passed to the function, each of which should be a pointer to a PyObject* variable; these will be filled in with the values from *args*; they will contain *borrowed references*. The variables which correspond to optional parameters not given by *args* will not be filled in; these should be initialized by the caller. This function returns true on success and false if *args* is not a tuple or contains the wrong number of elements; an exception will be set if there was a failure. This is an example of the use of this function, taken from the sources for the "_weakref" helper module for weak references: static PyObject * weakref_ref(PyObject *self, PyObject *args) { PyObject *object; PyObject *callback = NULL; PyObject *result = NULL; if (PyArg_UnpackTuple(args, "ref", 1, 2, &object, &callback)) { result = PyWeakref_NewRef(object, callback); } return result; } The call to "PyArg_UnpackTuple()" in this example is entirely equivalent to this call to "PyArg_ParseTuple()": PyArg_ParseTuple(args, "O|O:ref", &object, &callback) PY_CXX_CONST The value to be inserted, if any, before char *const* in the *keywords* parameter declaration of "PyArg_ParseTupleAndKeywords()" and "PyArg_VaParseTupleAndKeywords()". Default empty for C and "const" for C++ (const char *const*). To override, define it to the desired value before including "Python.h". Added in version 3.13. Building values =============== PyObject *Py_BuildValue(const char *format, ...) *Return value: New reference.** Part of the Stable ABI.* Create a new value based on a format string similar to those accepted by the "PyArg_Parse*" family of functions and a sequence of values. Returns the value or "NULL" in the case of an error; an exception will be raised if "NULL" is returned. "Py_BuildValue()" does not always build a tuple. It builds a tuple only if its format string contains two or more format units. If the format string is empty, it returns "None"; if it contains exactly one format unit, it returns whatever object is described by that format unit. To force it to return a tuple of size 0 or one, parenthesize the format string. When memory buffers are passed as parameters to supply data to build objects, as for the "s" and "s#" formats, the required data is copied. Buffers provided by the caller are never referenced by the objects created by "Py_BuildValue()". In other words, if your code invokes "malloc()" and passes the allocated memory to "Py_BuildValue()", your code is responsible for calling "free()" for that memory once "Py_BuildValue()" returns. In the following description, the quoted form is the format unit; the entry in (round) parentheses is the Python object type that the format unit will return; and the entry in [square] brackets is the type of the C value(s) to be passed. The characters space, tab, colon and comma are ignored in format strings (but not within format units such as "s#"). This can be used to make long format strings a tad more readable. "s" ("str" or "None") [const char *] Convert a null-terminated C string to a Python "str" object using "'utf-8'" encoding. If the C string pointer is "NULL", "None" is used. "s#" ("str" or "None") [const char *, "Py_ssize_t"] Convert a C string and its length to a Python "str" object using "'utf-8'" encoding. If the C string pointer is "NULL", the length is ignored and "None" is returned. "y" ("bytes") [const char *] This converts a C string to a Python "bytes" object. If the C string pointer is "NULL", "None" is returned. "y#" ("bytes") [const char *, "Py_ssize_t"] This converts a C string and its lengths to a Python object. If the C string pointer is "NULL", "None" is returned. "z" ("str" or "None") [const char *] Same as "s". "z#" ("str" or "None") [const char *, "Py_ssize_t"] Same as "s#". "u" ("str") [const wchar_t *] Convert a null-terminated "wchar_t" buffer of Unicode (UTF-16 or UCS-4) data to a Python Unicode object. If the Unicode buffer pointer is "NULL", "None" is returned. "u#" ("str") [const wchar_t *, "Py_ssize_t"] Convert a Unicode (UTF-16 or UCS-4) data buffer and its length to a Python Unicode object. If the Unicode buffer pointer is "NULL", the length is ignored and "None" is returned. "U" ("str" or "None") [const char *] Same as "s". "U#" ("str" or "None") [const char *, "Py_ssize_t"] Same as "s#". "i" ("int") [int] Convert a plain C int to a Python integer object. "b" ("int") [char] Convert a plain C char to a Python integer object. "h" ("int") [short int] Convert a plain C short int to a Python integer object. "l" ("int") [long int] Convert a C long int to a Python integer object. "B" ("int") [unsigned char] Convert a C unsigned char to a Python integer object. "H" ("int") [unsigned short int] Convert a C unsigned short int to a Python integer object. "I" ("int") [unsigned int] Convert a C unsigned int to a Python integer object. "k" ("int") [unsigned long] Convert a C unsigned long to a Python integer object. "L" ("int") [long long] Convert a C long long to a Python integer object. "K" ("int") [unsigned long long] Convert a C unsigned long long to a Python integer object. "n" ("int") ["Py_ssize_t"] Convert a C "Py_ssize_t" to a Python integer. "c" ("bytes" of length 1) [char] Convert a C int representing a byte to a Python "bytes" object of length 1. "C" ("str" of length 1) [int] Convert a C int representing a character to Python "str" object of length 1. "d" ("float") [double] Convert a C double to a Python floating-point number. "f" ("float") [float] Convert a C float to a Python floating-point number. "D" ("complex") [Py_complex *] Convert a C "Py_complex" structure to a Python complex number. "O" (object) [PyObject *] Pass a Python object untouched but create a new *strong reference* to it (i.e. its reference count is incremented by one). If the object passed in is a "NULL" pointer, it is assumed that this was caused because the call producing the argument found an error and set an exception. Therefore, "Py_BuildValue()" will return "NULL" but won’t raise an exception. If no exception has been raised yet, "SystemError" is set. "S" (object) [PyObject *] Same as "O". "N" (object) [PyObject *] Same as "O", except it doesn’t create a new *strong reference*. Useful when the object is created by a call to an object constructor in the argument list. "O&" (object) [*converter*, *anything*] Convert *anything* to a Python object through a *converter* function. The function is called with *anything* (which should be compatible with void*) as its argument and should return a “new” Python object, or "NULL" if an error occurred. "(items)" ("tuple") [*matching-items*] Convert a sequence of C values to a Python tuple with the same number of items. "[items]" ("list") [*matching-items*] Convert a sequence of C values to a Python list with the same number of items. "{items}" ("dict") [*matching-items*] Convert a sequence of C values to a Python dictionary. Each pair of consecutive C values adds one item to the dictionary, serving as key and value, respectively. If there is an error in the format string, the "SystemError" exception is set and "NULL" returned. PyObject *Py_VaBuildValue(const char *format, va_list vargs) *Return value: New reference.** Part of the Stable ABI.* Identical to "Py_BuildValue()", except that it accepts a va_list rather than a variable number of arguments. Boolean Objects *************** Booleans in Python are implemented as a subclass of integers. There are only two booleans, "Py_False" and "Py_True". As such, the normal creation and deletion functions don’t apply to booleans. The following macros are available, however. PyTypeObject PyBool_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python boolean type; it is the same object as "bool" in the Python layer. int PyBool_Check(PyObject *o) Return true if *o* is of type "PyBool_Type". This function always succeeds. PyObject *Py_False The Python "False" object. This object has no methods and is *immortal*. Changed in version 3.12: "Py_False" is *immortal*. PyObject *Py_True The Python "True" object. This object has no methods and is *immortal*. Changed in version 3.12: "Py_True" is *immortal*. Py_RETURN_FALSE Return "Py_False" from a function. Py_RETURN_TRUE Return "Py_True" from a function. PyObject *PyBool_FromLong(long v) *Return value: New reference.** Part of the Stable ABI.* Return "Py_True" or "Py_False", depending on the truth value of *v*. Buffer Protocol *************** Certain objects available in Python wrap access to an underlying memory array or *buffer*. Such objects include the built-in "bytes" and "bytearray", and some extension types like "array.array". Third- party libraries may define their own types for special purposes, such as image processing or numeric analysis. While each of these types have their own semantics, they share the common characteristic of being backed by a possibly large memory buffer. It is then desirable, in some situations, to access that buffer directly and without intermediate copying. Python provides such a facility at the C and Python level in the form of the buffer protocol. This protocol has two sides: * on the producer side, a type can export a “buffer interface” which allows objects of that type to expose information about their underlying buffer. This interface is described in the section Buffer Object Structures; for Python see Emulating buffer types. * on the consumer side, several means are available to obtain a pointer to the raw underlying data of an object (for example a method parameter). For Python see "memoryview". Simple objects such as "bytes" and "bytearray" expose their underlying buffer in byte-oriented form. Other forms are possible; for example, the elements exposed by an "array.array" can be multi-byte values. An example consumer of the buffer interface is the "write()" method of file objects: any object that can export a series of bytes through the buffer interface can be written to a file. While "write()" only needs read-only access to the internal contents of the object passed to it, other methods such as "readinto()" need write access to the contents of their argument. The buffer interface allows objects to selectively allow or reject exporting of read-write and read-only buffers. There are two ways for a consumer of the buffer interface to acquire a buffer over a target object: * call "PyObject_GetBuffer()" with the right parameters; * call "PyArg_ParseTuple()" (or one of its siblings) with one of the "y*", "w*" or "s*" format codes. In both cases, "PyBuffer_Release()" must be called when the buffer isn’t needed anymore. Failure to do so could lead to various issues such as resource leaks. Added in version 3.12: The buffer protocol is now accessible in Python, see Emulating buffer types and "memoryview". Buffer structure ================ Buffer structures (or simply “buffers”) are useful as a way to expose the binary data from another object to the Python programmer. They can also be used as a zero-copy slicing mechanism. Using their ability to reference a block of memory, it is possible to expose any data to the Python programmer quite easily. The memory could be a large, constant array in a C extension, it could be a raw block of memory for manipulation before passing to an operating system library, or it could be used to pass around structured data in its native, in- memory format. Contrary to most data types exposed by the Python interpreter, buffers are not "PyObject" pointers but rather simple C structures. This allows them to be created and copied very simply. When a generic wrapper around a buffer is needed, a memoryview object can be created. For short instructions how to write an exporting object, see Buffer Object Structures. For obtaining a buffer, see "PyObject_GetBuffer()". type Py_buffer * Part of the Stable ABI (including all members) since version 3.11.* void *buf A pointer to the start of the logical structure described by the buffer fields. This can be any location within the underlying physical memory block of the exporter. For example, with negative "strides" the value may point to the end of the memory block. For *contiguous* arrays, the value points to the beginning of the memory block. PyObject *obj A new reference to the exporting object. The reference is owned by the consumer and automatically released (i.e. reference count decremented) and set to "NULL" by "PyBuffer_Release()". The field is the equivalent of the return value of any standard C-API function. As a special case, for *temporary* buffers that are wrapped by "PyMemoryView_FromBuffer()" or "PyBuffer_FillInfo()" this field is "NULL". In general, exporting objects MUST NOT use this scheme. Py_ssize_t len "product(shape) * itemsize". For contiguous arrays, this is the length of the underlying memory block. For non-contiguous arrays, it is the length that the logical structure would have if it were copied to a contiguous representation. Accessing "((char *)buf)[0] up to ((char *)buf)[len-1]" is only valid if the buffer has been obtained by a request that guarantees contiguity. In most cases such a request will be "PyBUF_SIMPLE" or "PyBUF_WRITABLE". int readonly An indicator of whether the buffer is read-only. This field is controlled by the "PyBUF_WRITABLE" flag. Py_ssize_t itemsize Item size in bytes of a single element. Same as the value of "struct.calcsize()" called on non-"NULL" "format" values. Important exception: If a consumer requests a buffer without the "PyBUF_FORMAT" flag, "format" will be set to "NULL", but "itemsize" still has the value for the original format. If "shape" is present, the equality "product(shape) * itemsize == len" still holds and the consumer can use "itemsize" to navigate the buffer. If "shape" is "NULL" as a result of a "PyBUF_SIMPLE" or a "PyBUF_WRITABLE" request, the consumer must disregard "itemsize" and assume "itemsize == 1". char *format A *NULL* terminated string in "struct" module style syntax describing the contents of a single item. If this is "NULL", ""B"" (unsigned bytes) is assumed. This field is controlled by the "PyBUF_FORMAT" flag. int ndim The number of dimensions the memory represents as an n-dimensional array. If it is "0", "buf" points to a single item representing a scalar. In this case, "shape", "strides" and "suboffsets" MUST be "NULL". The maximum number of dimensions is given by "PyBUF_MAX_NDIM". Py_ssize_t *shape An array of "Py_ssize_t" of length "ndim" indicating the shape of the memory as an n-dimensional array. Note that "shape[0] * ... * shape[ndim-1] * itemsize" MUST be equal to "len". Shape values are restricted to "shape[n] >= 0". The case "shape[n] == 0" requires special attention. See complex arrays for further information. The shape array is read-only for the consumer. Py_ssize_t *strides An array of "Py_ssize_t" of length "ndim" giving the number of bytes to skip to get to a new element in each dimension. Stride values can be any integer. For regular arrays, strides are usually positive, but a consumer MUST be able to handle the case "strides[n] <= 0". See complex arrays for further information. The strides array is read-only for the consumer. Py_ssize_t *suboffsets An array of "Py_ssize_t" of length "ndim". If "suboffsets[n] >= 0", the values stored along the nth dimension are pointers and the suboffset value dictates how many bytes to add to each pointer after de-referencing. A suboffset value that is negative indicates that no de-referencing should occur (striding in a contiguous memory block). If all suboffsets are negative (i.e. no de-referencing is needed), then this field must be "NULL" (the default value). This type of array representation is used by the Python Imaging Library (PIL). See complex arrays for further information how to access elements of such an array. The suboffsets array is read-only for the consumer. void *internal This is for use internally by the exporting object. For example, this might be re-cast as an integer by the exporter and used to store flags about whether or not the shape, strides, and suboffsets arrays must be freed when the buffer is released. The consumer MUST NOT alter this value. Constants: PyBUF_MAX_NDIM The maximum number of dimensions the memory represents. Exporters MUST respect this limit, consumers of multi-dimensional buffers SHOULD be able to handle up to "PyBUF_MAX_NDIM" dimensions. Currently set to 64. Buffer request types ==================== Buffers are usually obtained by sending a buffer request to an exporting object via "PyObject_GetBuffer()". Since the complexity of the logical structure of the memory can vary drastically, the consumer uses the *flags* argument to specify the exact buffer type it can handle. All "Py_buffer" fields are unambiguously defined by the request type. request-independent fields -------------------------- The following fields are not influenced by *flags* and must always be filled in with the correct values: "obj", "buf", "len", "itemsize", "ndim". readonly, format ---------------- PyBUF_WRITABLE Controls the "readonly" field. If set, the exporter MUST provide a writable buffer or else report failure. Otherwise, the exporter MAY provide either a read-only or writable buffer, but the choice MUST be consistent for all consumers. For example, PyBUF_SIMPLE | PyBUF_WRITABLE can be used to request a simple writable buffer. PyBUF_FORMAT Controls the "format" field. If set, this field MUST be filled in correctly. Otherwise, this field MUST be "NULL". "PyBUF_WRITABLE" can be |’d to any of the flags in the next section. Since "PyBUF_SIMPLE" is defined as 0, "PyBUF_WRITABLE" can be used as a stand-alone flag to request a simple writable buffer. "PyBUF_FORMAT" must be |’d to any of the flags except "PyBUF_SIMPLE", because the latter already implies format "B" (unsigned bytes). "PyBUF_FORMAT" cannot be used on its own. shape, strides, suboffsets -------------------------- The flags that control the logical structure of the memory are listed in decreasing order of complexity. Note that each flag contains all bits of the flags below it. +-------------------------------+---------+-----------+--------------+ | Request | shape | strides | suboffsets | |===============================|=========|===========|==============| | PyBUF_INDIRECT | yes | yes | if needed | +-------------------------------+---------+-----------+--------------+ | PyBUF_STRIDES | yes | yes | NULL | +-------------------------------+---------+-----------+--------------+ | PyBUF_ND | yes | NULL | NULL | +-------------------------------+---------+-----------+--------------+ | PyBUF_SIMPLE | NULL | NULL | NULL | +-------------------------------+---------+-----------+--------------+ contiguity requests ------------------- C or Fortran *contiguity* can be explicitly requested, with and without stride information. Without stride information, the buffer must be C-contiguous. +-------------------------------------+---------+-----------+--------------+----------+ | Request | shape | strides | suboffsets | contig | |=====================================|=========|===========|==============|==========| | PyBUF_C_CONTIGUOUS | yes | yes | NULL | C | +-------------------------------------+---------+-----------+--------------+----------+ | PyBUF_F_CONTIGUOUS | yes | yes | NULL | F | +-------------------------------------+---------+-----------+--------------+----------+ | PyBUF_ANY_CONTIGUOUS | yes | yes | NULL | C or F | +-------------------------------------+---------+-----------+--------------+----------+ | "PyBUF_ND" | yes | NULL | NULL | C | +-------------------------------------+---------+-----------+--------------+----------+ compound requests ----------------- All possible requests are fully defined by some combination of the flags in the previous section. For convenience, the buffer protocol provides frequently used combinations as single flags. In the following table *U* stands for undefined contiguity. The consumer would have to call "PyBuffer_IsContiguous()" to determine contiguity. +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | Request | shape | strides | suboffsets | contig | readonly | format | |=================================|=========|===========|==============|==========|============|==========| | PyBUF_FULL | yes | yes | if needed | U | 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_FULL_RO | yes | yes | if needed | U | 1 or 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_RECORDS | yes | yes | NULL | U | 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_RECORDS_RO | yes | yes | NULL | U | 1 or 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_STRIDED | yes | yes | NULL | U | 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_STRIDED_RO | yes | yes | NULL | U | 1 or 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_CONTIG | yes | NULL | NULL | C | 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_CONTIG_RO | yes | NULL | NULL | C | 1 or 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ Complex arrays ============== NumPy-style: shape and strides ------------------------------ The logical structure of NumPy-style arrays is defined by "itemsize", "ndim", "shape" and "strides". If "ndim == 0", the memory location pointed to by "buf" is interpreted as a scalar of size "itemsize". In that case, both "shape" and "strides" are "NULL". If "strides" is "NULL", the array is interpreted as a standard n-dimensional C-array. Otherwise, the consumer must access an n-dimensional array as follows: ptr = (char *)buf + indices[0] * strides[0] + ... + indices[n-1] * strides[n-1]; item = *((typeof(item) *)ptr); As noted above, "buf" can point to any location within the actual memory block. An exporter can check the validity of a buffer with this function: def verify_structure(memlen, itemsize, ndim, shape, strides, offset): """Verify that the parameters represent a valid array within the bounds of the allocated memory: char *mem: start of the physical memory block memlen: length of the physical memory block offset: (char *)buf - mem """ if offset % itemsize: return False if offset < 0 or offset+itemsize > memlen: return False if any(v % itemsize for v in strides): return False if ndim <= 0: return ndim == 0 and not shape and not strides if 0 in shape: return True imin = sum(strides[j]*(shape[j]-1) for j in range(ndim) if strides[j] <= 0) imax = sum(strides[j]*(shape[j]-1) for j in range(ndim) if strides[j] > 0) return 0 <= offset+imin and offset+imax+itemsize <= memlen PIL-style: shape, strides and suboffsets ---------------------------------------- In addition to the regular items, PIL-style arrays can contain pointers that must be followed in order to get to the next element in a dimension. For example, the regular three-dimensional C-array "char v[2][2][3]" can also be viewed as an array of 2 pointers to 2 two- dimensional arrays: "char (*v[2])[2][3]". In suboffsets representation, those two pointers can be embedded at the start of "buf", pointing to two "char x[2][3]" arrays that can be located anywhere in memory. Here is a function that returns a pointer to the element in an N-D array pointed to by an N-dimensional index when there are both non-"NULL" strides and suboffsets: void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides, Py_ssize_t *suboffsets, Py_ssize_t *indices) { char *pointer = (char*)buf; int i; for (i = 0; i < ndim; i++) { pointer += strides[i] * indices[i]; if (suboffsets[i] >=0 ) { pointer = *((char**)pointer) + suboffsets[i]; } } return (void*)pointer; } Buffer-related functions ======================== int PyObject_CheckBuffer(PyObject *obj) * Part of the Stable ABI since version 3.11.* Return "1" if *obj* supports the buffer interface otherwise "0". When "1" is returned, it doesn’t guarantee that "PyObject_GetBuffer()" will succeed. This function always succeeds. int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags) * Part of the Stable ABI since version 3.11.* Send a request to *exporter* to fill in *view* as specified by *flags*. If the exporter cannot provide a buffer of the exact type, it MUST raise "BufferError", set "view->obj" to "NULL" and return "-1". On success, fill in *view*, set "view->obj" to a new reference to *exporter* and return 0. In the case of chained buffer providers that redirect requests to a single object, "view->obj" MAY refer to this object instead of *exporter* (See Buffer Object Structures). Successful calls to "PyObject_GetBuffer()" must be paired with calls to "PyBuffer_Release()", similar to "malloc()" and "free()". Thus, after the consumer is done with the buffer, "PyBuffer_Release()" must be called exactly once. void PyBuffer_Release(Py_buffer *view) * Part of the Stable ABI since version 3.11.* Release the buffer *view* and release the *strong reference* (i.e. decrement the reference count) to the view’s supporting object, "view->obj". This function MUST be called when the buffer is no longer being used, otherwise reference leaks may occur. It is an error to call this function on a buffer that was not obtained via "PyObject_GetBuffer()". Py_ssize_t PyBuffer_SizeFromFormat(const char *format) * Part of the Stable ABI since version 3.11.* Return the implied "itemsize" from "format". On error, raise an exception and return -1. Added in version 3.9. int PyBuffer_IsContiguous(const Py_buffer *view, char order) * Part of the Stable ABI since version 3.11.* Return "1" if the memory defined by the *view* is C-style (*order* is "'C'") or Fortran-style (*order* is "'F'") *contiguous* or either one (*order* is "'A'"). Return "0" otherwise. This function always succeeds. void *PyBuffer_GetPointer(const Py_buffer *view, const Py_ssize_t *indices) * Part of the Stable ABI since version 3.11.* Get the memory area pointed to by the *indices* inside the given *view*. *indices* must point to an array of "view->ndim" indices. int PyBuffer_FromContiguous(const Py_buffer *view, const void *buf, Py_ssize_t len, char fort) * Part of the Stable ABI since version 3.11.* Copy contiguous *len* bytes from *buf* to *view*. *fort* can be "'C'" or "'F'" (for C-style or Fortran-style ordering). "0" is returned on success, "-1" on error. int PyBuffer_ToContiguous(void *buf, const Py_buffer *src, Py_ssize_t len, char order) * Part of the Stable ABI since version 3.11.* Copy *len* bytes from *src* to its contiguous representation in *buf*. *order* can be "'C'" or "'F'" or "'A'" (for C-style or Fortran-style ordering or either one). "0" is returned on success, "-1" on error. This function fails if *len* != *src->len*. int PyObject_CopyData(PyObject *dest, PyObject *src) * Part of the Stable ABI since version 3.11.* Copy data from *src* to *dest* buffer. Can convert between C-style and or Fortran-style buffers. "0" is returned on success, "-1" on error. void PyBuffer_FillContiguousStrides(int ndims, Py_ssize_t *shape, Py_ssize_t *strides, int itemsize, char order) * Part of the Stable ABI since version 3.11.* Fill the *strides* array with byte-strides of a *contiguous* (C-style if *order* is "'C'" or Fortran-style if *order* is "'F'") array of the given shape with the given number of bytes per element. int PyBuffer_FillInfo(Py_buffer *view, PyObject *exporter, void *buf, Py_ssize_t len, int readonly, int flags) * Part of the Stable ABI since version 3.11.* Handle buffer requests for an exporter that wants to expose *buf* of size *len* with writability set according to *readonly*. *buf* is interpreted as a sequence of unsigned bytes. The *flags* argument indicates the request type. This function always fills in *view* as specified by flags, unless *buf* has been designated as read-only and "PyBUF_WRITABLE" is set in *flags*. On success, set "view->obj" to a new reference to *exporter* and return 0. Otherwise, raise "BufferError", set "view->obj" to "NULL" and return "-1"; If this function is used as part of a getbufferproc, *exporter* MUST be set to the exporting object and *flags* must be passed unmodified. Otherwise, *exporter* MUST be "NULL". Byte Array Objects ****************** type PyByteArrayObject This subtype of "PyObject" represents a Python bytearray object. PyTypeObject PyByteArray_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python bytearray type; it is the same object as "bytearray" in the Python layer. Type check macros ================= int PyByteArray_Check(PyObject *o) Return true if the object *o* is a bytearray object or an instance of a subtype of the bytearray type. This function always succeeds. int PyByteArray_CheckExact(PyObject *o) Return true if the object *o* is a bytearray object, but not an instance of a subtype of the bytearray type. This function always succeeds. Direct API functions ==================== PyObject *PyByteArray_FromObject(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Return a new bytearray object from any object, *o*, that implements the buffer protocol. On failure, return "NULL" with an exception set. PyObject *PyByteArray_FromStringAndSize(const char *string, Py_ssize_t len) *Return value: New reference.** Part of the Stable ABI.* Create a new bytearray object from *string* and its length, *len*. On failure, return "NULL" with an exception set. PyObject *PyByteArray_Concat(PyObject *a, PyObject *b) *Return value: New reference.** Part of the Stable ABI.* Concat bytearrays *a* and *b* and return a new bytearray with the result. On failure, return "NULL" with an exception set. Py_ssize_t PyByteArray_Size(PyObject *bytearray) * Part of the Stable ABI.* Return the size of *bytearray* after checking for a "NULL" pointer. char *PyByteArray_AsString(PyObject *bytearray) * Part of the Stable ABI.* Return the contents of *bytearray* as a char array after checking for a "NULL" pointer. The returned array always has an extra null byte appended. int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len) * Part of the Stable ABI.* Resize the internal buffer of *bytearray* to *len*. Macros ====== These macros trade safety for speed and they don’t check pointers. char *PyByteArray_AS_STRING(PyObject *bytearray) Similar to "PyByteArray_AsString()", but without error checking. Py_ssize_t PyByteArray_GET_SIZE(PyObject *bytearray) Similar to "PyByteArray_Size()", but without error checking. Bytes Objects ************* These functions raise "TypeError" when expecting a bytes parameter and called with a non-bytes parameter. type PyBytesObject This subtype of "PyObject" represents a Python bytes object. PyTypeObject PyBytes_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python bytes type; it is the same object as "bytes" in the Python layer. int PyBytes_Check(PyObject *o) Return true if the object *o* is a bytes object or an instance of a subtype of the bytes type. This function always succeeds. int PyBytes_CheckExact(PyObject *o) Return true if the object *o* is a bytes object, but not an instance of a subtype of the bytes type. This function always succeeds. PyObject *PyBytes_FromString(const char *v) *Return value: New reference.** Part of the Stable ABI.* Return a new bytes object with a copy of the string *v* as value on success, and "NULL" on failure. The parameter *v* must not be "NULL"; it will not be checked. PyObject *PyBytes_FromStringAndSize(const char *v, Py_ssize_t len) *Return value: New reference.** Part of the Stable ABI.* Return a new bytes object with a copy of the string *v* as value and length *len* on success, and "NULL" on failure. If *v* is "NULL", the contents of the bytes object are uninitialized. PyObject *PyBytes_FromFormat(const char *format, ...) *Return value: New reference.** Part of the Stable ABI.* Take a C "printf()"-style *format* string and a variable number of arguments, calculate the size of the resulting Python bytes object and return a bytes object with the values formatted into it. The variable arguments must be C types and must correspond exactly to the format characters in the *format* string. The following format characters are allowed: +---------------------+-----------------+----------------------------------+ | Format Characters | Type | Comment | |=====================|=================|==================================| | "%%" | *n/a* | The literal % character. | +---------------------+-----------------+----------------------------------+ | "%c" | int | A single byte, represented as a | | | | C int. | +---------------------+-----------------+----------------------------------+ | "%d" | int | Equivalent to "printf("%d")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%u" | unsigned int | Equivalent to "printf("%u")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%ld" | long | Equivalent to "printf("%ld")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%lu" | unsigned long | Equivalent to "printf("%lu")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%zd" | "Py_ssize_t" | Equivalent to "printf("%zd")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%zu" | size_t | Equivalent to "printf("%zu")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%i" | int | Equivalent to "printf("%i")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%x" | int | Equivalent to "printf("%x")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%s" | const char* | A null-terminated C character | | | | array. | +---------------------+-----------------+----------------------------------+ | "%p" | const void* | The hex representation of a C | | | | pointer. Mostly equivalent to | | | | "printf("%p")" except that it is | | | | guaranteed to start with the | | | | literal "0x" regardless of what | | | | the platform’s "printf" yields. | +---------------------+-----------------+----------------------------------+ An unrecognized format character causes all the rest of the format string to be copied as-is to the result object, and any extra arguments discarded. [1] For integer specifiers (d, u, ld, lu, zd, zu, i, x): the 0-conversion flag has effect even when a precision is given. PyObject *PyBytes_FromFormatV(const char *format, va_list vargs) *Return value: New reference.** Part of the Stable ABI.* Identical to "PyBytes_FromFormat()" except that it takes exactly two arguments. PyObject *PyBytes_FromObject(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Return the bytes representation of object *o* that implements the buffer protocol. Py_ssize_t PyBytes_Size(PyObject *o) * Part of the Stable ABI.* Return the length of the bytes in bytes object *o*. Py_ssize_t PyBytes_GET_SIZE(PyObject *o) Similar to "PyBytes_Size()", but without error checking. char *PyBytes_AsString(PyObject *o) * Part of the Stable ABI.* Return a pointer to the contents of *o*. The pointer refers to the internal buffer of *o*, which consists of "len(o) + 1" bytes. The last byte in the buffer is always null, regardless of whether there are any other null bytes. The data must not be modified in any way, unless the object was just created using "PyBytes_FromStringAndSize(NULL, size)". It must not be deallocated. If *o* is not a bytes object at all, "PyBytes_AsString()" returns "NULL" and raises "TypeError". char *PyBytes_AS_STRING(PyObject *string) Similar to "PyBytes_AsString()", but without error checking. int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length) * Part of the Stable ABI.* Return the null-terminated contents of the object *obj* through the output variables *buffer* and *length*. Returns "0" on success. If *length* is "NULL", the bytes object may not contain embedded null bytes; if it does, the function returns "-1" and a "ValueError" is raised. The buffer refers to an internal buffer of *obj*, which includes an additional null byte at the end (not counted in *length*). The data must not be modified in any way, unless the object was just created using "PyBytes_FromStringAndSize(NULL, size)". It must not be deallocated. If *obj* is not a bytes object at all, "PyBytes_AsStringAndSize()" returns "-1" and raises "TypeError". Changed in version 3.5: Previously, "TypeError" was raised when embedded null bytes were encountered in the bytes object. void PyBytes_Concat(PyObject **bytes, PyObject *newpart) * Part of the Stable ABI.* Create a new bytes object in **bytes* containing the contents of *newpart* appended to *bytes*; the caller will own the new reference. The reference to the old value of *bytes* will be stolen. If the new object cannot be created, the old reference to *bytes* will still be discarded and the value of **bytes* will be set to "NULL"; the appropriate exception will be set. void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart) * Part of the Stable ABI.* Create a new bytes object in **bytes* containing the contents of *newpart* appended to *bytes*. This version releases the *strong reference* to *newpart* (i.e. decrements its reference count). int _PyBytes_Resize(PyObject **bytes, Py_ssize_t newsize) Resize a bytes object. *newsize* will be the new length of the bytes object. You can think of it as creating a new bytes object and destroying the old one, only more efficiently. Pass the address of an existing bytes object as an lvalue (it may be written into), and the new size desired. On success, **bytes* holds the resized bytes object and "0" is returned; the address in **bytes* may differ from its input value. If the reallocation fails, the original bytes object at **bytes* is deallocated, **bytes* is set to "NULL", "MemoryError" is set, and "-1" is returned. Call Protocol ************* CPython supports two different calling protocols: *tp_call* and vectorcall. The *tp_call* Protocol ====================== Instances of classes that set "tp_call" are callable. The signature of the slot is: PyObject *tp_call(PyObject *callable, PyObject *args, PyObject *kwargs); A call is made using a tuple for the positional arguments and a dict for the keyword arguments, similarly to "callable(*args, **kwargs)" in Python code. *args* must be non-NULL (use an empty tuple if there are no arguments) but *kwargs* may be *NULL* if there are no keyword arguments. This convention is not only used by *tp_call*: "tp_new" and "tp_init" also pass arguments this way. To call an object, use "PyObject_Call()" or another call API. The Vectorcall Protocol ======================= Added in version 3.9. The vectorcall protocol was introduced in **PEP 590** as an additional protocol for making calls more efficient. As rule of thumb, CPython will prefer the vectorcall for internal calls if the callable supports it. However, this is not a hard rule. Additionally, some third-party extensions use *tp_call* directly (rather than using "PyObject_Call()"). Therefore, a class supporting vectorcall must also implement "tp_call". Moreover, the callable must behave the same regardless of which protocol is used. The recommended way to achieve this is by setting "tp_call" to "PyVectorcall_Call()". This bears repeating: Warning: A class supporting vectorcall **must** also implement "tp_call" with the same semantics. Changed in version 3.12: The "Py_TPFLAGS_HAVE_VECTORCALL" flag is now removed from a class when the class’s "__call__()" method is reassigned. (This internally sets "tp_call" only, and thus may make it behave differently than the vectorcall function.) In earlier Python versions, vectorcall should only be used with "immutable" or static types. A class should not implement vectorcall if that would be slower than *tp_call*. For example, if the callee needs to convert the arguments to an args tuple and kwargs dict anyway, then there is no point in implementing vectorcall. Classes can implement the vectorcall protocol by enabling the "Py_TPFLAGS_HAVE_VECTORCALL" flag and setting "tp_vectorcall_offset" to the offset inside the object structure where a *vectorcallfunc* appears. This is a pointer to a function with the following signature: typedef PyObject *(*vectorcallfunc)(PyObject *callable, PyObject *const *args, size_t nargsf, PyObject *kwnames) * Part of the Stable ABI since version 3.12.* * *callable* is the object being called. * *args* is a C array consisting of the positional arguments followed by the values of the keyword arguments. This can be *NULL* if there are no arguments. * *nargsf* is the number of positional arguments plus possibly the "PY_VECTORCALL_ARGUMENTS_OFFSET" flag. To get the actual number of positional arguments from *nargsf*, use "PyVectorcall_NARGS()". * *kwnames* is a tuple containing the names of the keyword arguments; in other words, the keys of the kwargs dict. These names must be strings (instances of "str" or a subclass) and they must be unique. If there are no keyword arguments, then *kwnames* can instead be *NULL*. PY_VECTORCALL_ARGUMENTS_OFFSET * Part of the Stable ABI since version 3.12.* If this flag is set in a vectorcall *nargsf* argument, the callee is allowed to temporarily change "args[-1]". In other words, *args* points to argument 1 (not 0) in the allocated vector. The callee must restore the value of "args[-1]" before returning. For "PyObject_VectorcallMethod()", this flag means instead that "args[0]" may be changed. Whenever they can do so cheaply (without additional allocation), callers are encouraged to use "PY_VECTORCALL_ARGUMENTS_OFFSET". Doing so will allow callables such as bound methods to make their onward calls (which include a prepended *self* argument) very efficiently. Added in version 3.8. To call an object that implements vectorcall, use a call API function as with any other callable. "PyObject_Vectorcall()" will usually be most efficient. Recursion Control ----------------- When using *tp_call*, callees do not need to worry about recursion: CPython uses "Py_EnterRecursiveCall()" and "Py_LeaveRecursiveCall()" for calls made using *tp_call*. For efficiency, this is not the case for calls done using vectorcall: the callee should use *Py_EnterRecursiveCall* and *Py_LeaveRecursiveCall* if needed. Vectorcall Support API ---------------------- Py_ssize_t PyVectorcall_NARGS(size_t nargsf) * Part of the Stable ABI since version 3.12.* Given a vectorcall *nargsf* argument, return the actual number of arguments. Currently equivalent to: (Py_ssize_t)(nargsf & ~PY_VECTORCALL_ARGUMENTS_OFFSET) However, the function "PyVectorcall_NARGS" should be used to allow for future extensions. Added in version 3.8. vectorcallfunc PyVectorcall_Function(PyObject *op) If *op* does not support the vectorcall protocol (either because the type does not or because the specific instance does not), return *NULL*. Otherwise, return the vectorcall function pointer stored in *op*. This function never raises an exception. This is mostly useful to check whether or not *op* supports vectorcall, which can be done by checking "PyVectorcall_Function(op) != NULL". Added in version 3.9. PyObject *PyVectorcall_Call(PyObject *callable, PyObject *tuple, PyObject *dict) * Part of the Stable ABI since version 3.12.* Call *callable*’s "vectorcallfunc" with positional and keyword arguments given in a tuple and dict, respectively. This is a specialized function, intended to be put in the "tp_call" slot or be used in an implementation of "tp_call". It does not check the "Py_TPFLAGS_HAVE_VECTORCALL" flag and it does not fall back to "tp_call". Added in version 3.8. Object Calling API ================== Various functions are available for calling a Python object. Each converts its arguments to a convention supported by the called object – either *tp_call* or vectorcall. In order to do as little conversion as possible, pick one that best fits the format of data you have available. The following table summarizes the available functions; please see individual documentation for details. +--------------------------------------------+--------------------+----------------------+-----------------+ | Function | callable | args | kwargs | |============================================|====================|======================|=================| | "PyObject_Call()" | "PyObject *" | tuple | dict/"NULL" | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallNoArgs()" | "PyObject *" | — | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallOneArg()" | "PyObject *" | 1 object | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallObject()" | "PyObject *" | tuple/"NULL" | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallFunction()" | "PyObject *" | format | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethod()" | obj + "char*" | format | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallFunctionObjArgs()" | "PyObject *" | variadic | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethodObjArgs()" | obj + name | variadic | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethodNoArgs()" | obj + name | — | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethodOneArg()" | obj + name | 1 object | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_Vectorcall()" | "PyObject *" | vectorcall | vectorcall | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_VectorcallDict()" | "PyObject *" | vectorcall | dict/"NULL" | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_VectorcallMethod()" | arg + name | vectorcall | vectorcall | +--------------------------------------------+--------------------+----------------------+-----------------+ PyObject *PyObject_Call(PyObject *callable, PyObject *args, PyObject *kwargs) *Return value: New reference.** Part of the Stable ABI.* Call a callable Python object *callable*, with arguments given by the tuple *args*, and named arguments given by the dictionary *kwargs*. *args* must not be *NULL*; use an empty tuple if no arguments are needed. If no named arguments are needed, *kwargs* can be *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(*args, **kwargs)". PyObject *PyObject_CallNoArgs(PyObject *callable) *Return value: New reference.** Part of the Stable ABI since version 3.10.* Call a callable Python object *callable* without any arguments. It is the most efficient way to call a callable Python object without any argument. Return the result of the call on success, or raise an exception and return *NULL* on failure. Added in version 3.9. PyObject *PyObject_CallOneArg(PyObject *callable, PyObject *arg) *Return value: New reference.* Call a callable Python object *callable* with exactly 1 positional argument *arg* and no keyword arguments. Return the result of the call on success, or raise an exception and return *NULL* on failure. Added in version 3.9. PyObject *PyObject_CallObject(PyObject *callable, PyObject *args) *Return value: New reference.** Part of the Stable ABI.* Call a callable Python object *callable*, with arguments given by the tuple *args*. If no arguments are needed, then *args* can be *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(*args)". PyObject *PyObject_CallFunction(PyObject *callable, const char *format, ...) *Return value: New reference.** Part of the Stable ABI.* Call a callable Python object *callable*, with a variable number of C arguments. The C arguments are described using a "Py_BuildValue()" style format string. The format can be *NULL*, indicating that no arguments are provided. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(*args)". Note that if you only pass PyObject* args, "PyObject_CallFunctionObjArgs()" is a faster alternative. Changed in version 3.4: The type of *format* was changed from "char *". PyObject *PyObject_CallMethod(PyObject *obj, const char *name, const char *format, ...) *Return value: New reference.** Part of the Stable ABI.* Call the method named *name* of object *obj* with a variable number of C arguments. The C arguments are described by a "Py_BuildValue()" format string that should produce a tuple. The format can be *NULL*, indicating that no arguments are provided. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "obj.name(arg1, arg2, ...)". Note that if you only pass PyObject* args, "PyObject_CallMethodObjArgs()" is a faster alternative. Changed in version 3.4: The types of *name* and *format* were changed from "char *". PyObject *PyObject_CallFunctionObjArgs(PyObject *callable, ...) *Return value: New reference.** Part of the Stable ABI.* Call a callable Python object *callable*, with a variable number of PyObject* arguments. The arguments are provided as a variable number of parameters followed by *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(arg1, arg2, ...)". PyObject *PyObject_CallMethodObjArgs(PyObject *obj, PyObject *name, ...) *Return value: New reference.** Part of the Stable ABI.* Call a method of the Python object *obj*, where the name of the method is given as a Python string object in *name*. It is called with a variable number of PyObject* arguments. The arguments are provided as a variable number of parameters followed by *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. PyObject *PyObject_CallMethodNoArgs(PyObject *obj, PyObject *name) Call a method of the Python object *obj* without arguments, where the name of the method is given as a Python string object in *name*. Return the result of the call on success, or raise an exception and return *NULL* on failure. Added in version 3.9. PyObject *PyObject_CallMethodOneArg(PyObject *obj, PyObject *name, PyObject *arg) Call a method of the Python object *obj* with a single positional argument *arg*, where the name of the method is given as a Python string object in *name*. Return the result of the call on success, or raise an exception and return *NULL* on failure. Added in version 3.9. PyObject *PyObject_Vectorcall(PyObject *callable, PyObject *const *args, size_t nargsf, PyObject *kwnames) * Part of the Stable ABI since version 3.12.* Call a callable Python object *callable*. The arguments are the same as for "vectorcallfunc". If *callable* supports vectorcall, this directly calls the vectorcall function stored in *callable*. Return the result of the call on success, or raise an exception and return *NULL* on failure. Added in version 3.9. PyObject *PyObject_VectorcallDict(PyObject *callable, PyObject *const *args, size_t nargsf, PyObject *kwdict) Call *callable* with positional arguments passed exactly as in the vectorcall protocol, but with keyword arguments passed as a dictionary *kwdict*. The *args* array contains only the positional arguments. Regardless of which protocol is used internally, a conversion of arguments needs to be done. Therefore, this function should only be used if the caller already has a dictionary ready to use for the keyword arguments, but not a tuple for the positional arguments. Added in version 3.9. PyObject *PyObject_VectorcallMethod(PyObject *name, PyObject *const *args, size_t nargsf, PyObject *kwnames) * Part of the Stable ABI since version 3.12.* Call a method using the vectorcall calling convention. The name of the method is given as a Python string *name*. The object whose method is called is *args[0]*, and the *args* array starting at *args[1]* represents the arguments of the call. There must be at least one positional argument. *nargsf* is the number of positional arguments including *args[0]*, plus "PY_VECTORCALL_ARGUMENTS_OFFSET" if the value of "args[0]" may temporarily be changed. Keyword arguments can be passed just like in "PyObject_Vectorcall()". If the object has the "Py_TPFLAGS_METHOD_DESCRIPTOR" feature, this will call the unbound method object with the full *args* vector as arguments. Return the result of the call on success, or raise an exception and return *NULL* on failure. Added in version 3.9. Call Support API ================ int PyCallable_Check(PyObject *o) * Part of the Stable ABI.* Determine if the object *o* is callable. Return "1" if the object is callable and "0" otherwise. This function always succeeds. Capsules ******** Refer to Providing a C API for an Extension Module for more information on using these objects. Added in version 3.1. type PyCapsule This subtype of "PyObject" represents an opaque value, useful for C extension modules who need to pass an opaque value (as a void* pointer) through Python code to other C code. It is often used to make a C function pointer defined in one module available to other modules, so the regular import mechanism can be used to access C APIs defined in dynamically loaded modules. type PyCapsule_Destructor * Part of the Stable ABI.* The type of a destructor callback for a capsule. Defined as: typedef void (*PyCapsule_Destructor)(PyObject *); See "PyCapsule_New()" for the semantics of PyCapsule_Destructor callbacks. int PyCapsule_CheckExact(PyObject *p) Return true if its argument is a "PyCapsule". This function always succeeds. PyObject *PyCapsule_New(void *pointer, const char *name, PyCapsule_Destructor destructor) *Return value: New reference.** Part of the Stable ABI.* Create a "PyCapsule" encapsulating the *pointer*. The *pointer* argument may not be "NULL". On failure, set an exception and return "NULL". The *name* string may either be "NULL" or a pointer to a valid C string. If non-"NULL", this string must outlive the capsule. (Though it is permitted to free it inside the *destructor*.) If the *destructor* argument is not "NULL", it will be called with the capsule as its argument when it is destroyed. If this capsule will be stored as an attribute of a module, the *name* should be specified as "modulename.attributename". This will enable other modules to import the capsule using "PyCapsule_Import()". void *PyCapsule_GetPointer(PyObject *capsule, const char *name) * Part of the Stable ABI.* Retrieve the *pointer* stored in the capsule. On failure, set an exception and return "NULL". The *name* parameter must compare exactly to the name stored in the capsule. If the name stored in the capsule is "NULL", the *name* passed in must also be "NULL". Python uses the C function "strcmp()" to compare capsule names. PyCapsule_Destructor PyCapsule_GetDestructor(PyObject *capsule) * Part of the Stable ABI.* Return the current destructor stored in the capsule. On failure, set an exception and return "NULL". It is legal for a capsule to have a "NULL" destructor. This makes a "NULL" return code somewhat ambiguous; use "PyCapsule_IsValid()" or "PyErr_Occurred()" to disambiguate. void *PyCapsule_GetContext(PyObject *capsule) * Part of the Stable ABI.* Return the current context stored in the capsule. On failure, set an exception and return "NULL". It is legal for a capsule to have a "NULL" context. This makes a "NULL" return code somewhat ambiguous; use "PyCapsule_IsValid()" or "PyErr_Occurred()" to disambiguate. const char *PyCapsule_GetName(PyObject *capsule) * Part of the Stable ABI.* Return the current name stored in the capsule. On failure, set an exception and return "NULL". It is legal for a capsule to have a "NULL" name. This makes a "NULL" return code somewhat ambiguous; use "PyCapsule_IsValid()" or "PyErr_Occurred()" to disambiguate. void *PyCapsule_Import(const char *name, int no_block) * Part of the Stable ABI.* Import a pointer to a C object from a capsule attribute in a module. The *name* parameter should specify the full name to the attribute, as in "module.attribute". The *name* stored in the capsule must match this string exactly. Return the capsule’s internal *pointer* on success. On failure, set an exception and return "NULL". Changed in version 3.3: *no_block* has no effect anymore. int PyCapsule_IsValid(PyObject *capsule, const char *name) * Part of the Stable ABI.* Determines whether or not *capsule* is a valid capsule. A valid capsule is non-"NULL", passes "PyCapsule_CheckExact()", has a non-"NULL" pointer stored in it, and its internal name matches the *name* parameter. (See "PyCapsule_GetPointer()" for information on how capsule names are compared.) In other words, if "PyCapsule_IsValid()" returns a true value, calls to any of the accessors (any function starting with "PyCapsule_Get") are guaranteed to succeed. Return a nonzero value if the object is valid and matches the name passed in. Return "0" otherwise. This function will not fail. int PyCapsule_SetContext(PyObject *capsule, void *context) * Part of the Stable ABI.* Set the context pointer inside *capsule* to *context*. Return "0" on success. Return nonzero and set an exception on failure. int PyCapsule_SetDestructor(PyObject *capsule, PyCapsule_Destructor destructor) * Part of the Stable ABI.* Set the destructor inside *capsule* to *destructor*. Return "0" on success. Return nonzero and set an exception on failure. int PyCapsule_SetName(PyObject *capsule, const char *name) * Part of the Stable ABI.* Set the name inside *capsule* to *name*. If non-"NULL", the name must outlive the capsule. If the previous *name* stored in the capsule was not "NULL", no attempt is made to free it. Return "0" on success. Return nonzero and set an exception on failure. int PyCapsule_SetPointer(PyObject *capsule, void *pointer) * Part of the Stable ABI.* Set the void pointer inside *capsule* to *pointer*. The pointer may not be "NULL". Return "0" on success. Return nonzero and set an exception on failure. Cell Objects ************ “Cell” objects are used to implement variables referenced by multiple scopes. For each such variable, a cell object is created to store the value; the local variables of each stack frame that references the value contains a reference to the cells from outer scopes which also use that variable. When the value is accessed, the value contained in the cell is used instead of the cell object itself. This de- referencing of the cell object requires support from the generated byte-code; these are not automatically de-referenced when accessed. Cell objects are not likely to be useful elsewhere. type PyCellObject The C structure used for cell objects. PyTypeObject PyCell_Type The type object corresponding to cell objects. int PyCell_Check(PyObject *ob) Return true if *ob* is a cell object; *ob* must not be "NULL". This function always succeeds. PyObject *PyCell_New(PyObject *ob) *Return value: New reference.* Create and return a new cell object containing the value *ob*. The parameter may be "NULL". PyObject *PyCell_Get(PyObject *cell) *Return value: New reference.* Return the contents of the cell *cell*, which can be "NULL". If *cell* is not a cell object, returns "NULL" with an exception set. PyObject *PyCell_GET(PyObject *cell) *Return value: Borrowed reference.* Return the contents of the cell *cell*, but without checking that *cell* is non-"NULL" and a cell object. int PyCell_Set(PyObject *cell, PyObject *value) Set the contents of the cell object *cell* to *value*. This releases the reference to any current content of the cell. *value* may be "NULL". *cell* must be non-"NULL". On success, return "0". If *cell* is not a cell object, set an exception and return "-1". void PyCell_SET(PyObject *cell, PyObject *value) Sets the value of the cell object *cell* to *value*. No reference counts are adjusted, and no checks are made for safety; *cell* must be non-"NULL" and must be a cell object. Code Objects ************ Code objects are a low-level detail of the CPython implementation. Each one represents a chunk of executable code that hasn’t yet been bound into a function. type PyCodeObject The C structure of the objects used to describe code objects. The fields of this type are subject to change at any time. PyTypeObject PyCode_Type This is an instance of "PyTypeObject" representing the Python code object. int PyCode_Check(PyObject *co) Return true if *co* is a code object. This function always succeeds. Py_ssize_t PyCode_GetNumFree(PyCodeObject *co) Return the number of *free (closure) variables* in a code object. int PyUnstable_Code_GetFirstFree(PyCodeObject *co) *This is Unstable API. It may change without warning in minor releases.* Return the position of the first *free (closure) variable* in a code object. Changed in version 3.13: Renamed from "PyCode_GetFirstFree" as part of Unstable C API. The old name is deprecated, but will remain available until the signature changes again. PyCodeObject *PyUnstable_Code_New(int argcount, int kwonlyargcount, int nlocals, int stacksize, int flags, PyObject *code, PyObject *consts, PyObject *names, PyObject *varnames, PyObject *freevars, PyObject *cellvars, PyObject *filename, PyObject *name, PyObject *qualname, int firstlineno, PyObject *linetable, PyObject *exceptiontable) *This is Unstable API. It may change without warning in minor releases.* Return a new code object. If you need a dummy code object to create a frame, use "PyCode_NewEmpty()" instead. Since the definition of the bytecode changes often, calling "PyUnstable_Code_New()" directly can bind you to a precise Python version. The many arguments of this function are inter-dependent in complex ways, meaning that subtle changes to values are likely to result in incorrect execution or VM crashes. Use this function only with extreme care. Changed in version 3.11: Added "qualname" and "exceptiontable" parameters. Changed in version 3.12: Renamed from "PyCode_New" as part of Unstable C API. The old name is deprecated, but will remain available until the signature changes again. PyCodeObject *PyUnstable_Code_NewWithPosOnlyArgs(int argcount, int posonlyargcount, int kwonlyargcount, int nlocals, int stacksize, int flags, PyObject *code, PyObject *consts, PyObject *names, PyObject *varnames, PyObject *freevars, PyObject *cellvars, PyObject *filename, PyObject *name, PyObject *qualname, int firstlineno, PyObject *linetable, PyObject *exceptiontable) *This is Unstable API. It may change without warning in minor releases.* Similar to "PyUnstable_Code_New()", but with an extra “posonlyargcount” for positional-only arguments. The same caveats that apply to "PyUnstable_Code_New" also apply to this function. Added in version 3.8: as "PyCode_NewWithPosOnlyArgs" Changed in version 3.11: Added "qualname" and "exceptiontable" parameters. Changed in version 3.12: Renamed to "PyUnstable_Code_NewWithPosOnlyArgs". The old name is deprecated, but will remain available until the signature changes again. PyCodeObject *PyCode_NewEmpty(const char *filename, const char *funcname, int firstlineno) *Return value: New reference.* Return a new empty code object with the specified filename, function name, and first line number. The resulting code object will raise an "Exception" if executed. int PyCode_Addr2Line(PyCodeObject *co, int byte_offset) Return the line number of the instruction that occurs on or before "byte_offset" and ends after it. If you just need the line number of a frame, use "PyFrame_GetLineNumber()" instead. For efficiently iterating over the line numbers in a code object, use **the API described in PEP 626**. int PyCode_Addr2Location(PyObject *co, int byte_offset, int *start_line, int *start_column, int *end_line, int *end_column) Sets the passed "int" pointers to the source code line and column numbers for the instruction at "byte_offset". Sets the value to "0" when information is not available for any particular element. Returns "1" if the function succeeds and 0 otherwise. Added in version 3.11. PyObject *PyCode_GetCode(PyCodeObject *co) Equivalent to the Python code "getattr(co, 'co_code')". Returns a strong reference to a "PyBytesObject" representing the bytecode in a code object. On error, "NULL" is returned and an exception is raised. This "PyBytesObject" may be created on-demand by the interpreter and does not necessarily represent the bytecode actually executed by CPython. The primary use case for this function is debuggers and profilers. Added in version 3.11. PyObject *PyCode_GetVarnames(PyCodeObject *co) Equivalent to the Python code "getattr(co, 'co_varnames')". Returns a new reference to a "PyTupleObject" containing the names of the local variables. On error, "NULL" is returned and an exception is raised. Added in version 3.11. PyObject *PyCode_GetCellvars(PyCodeObject *co) Equivalent to the Python code "getattr(co, 'co_cellvars')". Returns a new reference to a "PyTupleObject" containing the names of the local variables that are referenced by nested functions. On error, "NULL" is returned and an exception is raised. Added in version 3.11. PyObject *PyCode_GetFreevars(PyCodeObject *co) Equivalent to the Python code "getattr(co, 'co_freevars')". Returns a new reference to a "PyTupleObject" containing the names of the *free (closure) variables*. On error, "NULL" is returned and an exception is raised. Added in version 3.11. int PyCode_AddWatcher(PyCode_WatchCallback callback) Register *callback* as a code object watcher for the current interpreter. Return an ID which may be passed to "PyCode_ClearWatcher()". In case of error (e.g. no more watcher IDs available), return "-1" and set an exception. Added in version 3.12. int PyCode_ClearWatcher(int watcher_id) Clear watcher identified by *watcher_id* previously returned from "PyCode_AddWatcher()" for the current interpreter. Return "0" on success, or "-1" and set an exception on error (e.g. if the given *watcher_id* was never registered.) Added in version 3.12. type PyCodeEvent Enumeration of possible code object watcher events: - "PY_CODE_EVENT_CREATE" - "PY_CODE_EVENT_DESTROY" Added in version 3.12. typedef int (*PyCode_WatchCallback)(PyCodeEvent event, PyCodeObject *co) Type of a code object watcher callback function. If *event* is "PY_CODE_EVENT_CREATE", then the callback is invoked after *co* has been fully initialized. Otherwise, the callback is invoked before the destruction of *co* takes place, so the prior state of *co* can be inspected. If *event* is "PY_CODE_EVENT_DESTROY", taking a reference in the callback to the about-to-be-destroyed code object will resurrect it and prevent it from being freed at this time. When the resurrected object is destroyed later, any watcher callbacks active at that time will be called again. Users of this API should not rely on internal runtime implementation details. Such details may include, but are not limited to, the exact order and timing of creation and destruction of code objects. While changes in these details may result in differences observable by watchers (including whether a callback is invoked or not), it does not change the semantics of the Python code being executed. If the callback sets an exception, it must return "-1"; this exception will be printed as an unraisable exception using "PyErr_WriteUnraisable()". Otherwise it should return "0". There may already be a pending exception set on entry to the callback. In this case, the callback should return "0" with the same exception still set. This means the callback may not call any other API that can set an exception unless it saves and clears the exception state first, and restores it before returning. Added in version 3.12. Extra information ***************** To support low-level extensions to frame evaluation, such as external just-in-time compilers, it is possible to attach arbitrary extra data to code objects. These functions are part of the unstable C API tier: this functionality is a CPython implementation detail, and the API may change without deprecation warnings. Py_ssize_t PyUnstable_Eval_RequestCodeExtraIndex(freefunc free) *This is Unstable API. It may change without warning in minor releases.* Return a new an opaque index value used to adding data to code objects. You generally call this function once (per interpreter) and use the result with "PyCode_GetExtra" and "PyCode_SetExtra" to manipulate data on individual code objects. If *free* is not "NULL": when a code object is deallocated, *free* will be called on non-"NULL" data stored under the new index. Use "Py_DecRef()" when storing "PyObject". Added in version 3.6: as "_PyEval_RequestCodeExtraIndex" Changed in version 3.12: Renamed to "PyUnstable_Eval_RequestCodeExtraIndex". The old private name is deprecated, but will be available until the API changes. int PyUnstable_Code_GetExtra(PyObject *code, Py_ssize_t index, void **extra) *This is Unstable API. It may change without warning in minor releases.* Set *extra* to the extra data stored under the given index. Return 0 on success. Set an exception and return -1 on failure. If no data was set under the index, set *extra* to "NULL" and return 0 without setting an exception. Added in version 3.6: as "_PyCode_GetExtra" Changed in version 3.12: Renamed to "PyUnstable_Code_GetExtra". The old private name is deprecated, but will be available until the API changes. int PyUnstable_Code_SetExtra(PyObject *code, Py_ssize_t index, void *extra) *This is Unstable API. It may change without warning in minor releases.* Set the extra data stored under the given index to *extra*. Return 0 on success. Set an exception and return -1 on failure. Added in version 3.6: as "_PyCode_SetExtra" Changed in version 3.12: Renamed to "PyUnstable_Code_SetExtra". The old private name is deprecated, but will be available until the API changes. Codec registry and support functions ************************************ int PyCodec_Register(PyObject *search_function) * Part of the Stable ABI.* Register a new codec search function. As side effect, this tries to load the "encodings" package, if not yet done, to make sure that it is always first in the list of search functions. int PyCodec_Unregister(PyObject *search_function) * Part of the Stable ABI since version 3.10.* Unregister a codec search function and clear the registry’s cache. If the search function is not registered, do nothing. Return 0 on success. Raise an exception and return -1 on error. Added in version 3.10. int PyCodec_KnownEncoding(const char *encoding) * Part of the Stable ABI.* Return "1" or "0" depending on whether there is a registered codec for the given *encoding*. This function always succeeds. PyObject *PyCodec_Encode(PyObject *object, const char *encoding, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Generic codec based encoding API. *object* is passed through the encoder function found for the given *encoding* using the error handling method defined by *errors*. *errors* may be "NULL" to use the default method defined for the codec. Raises a "LookupError" if no encoder can be found. PyObject *PyCodec_Decode(PyObject *object, const char *encoding, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Generic codec based decoding API. *object* is passed through the decoder function found for the given *encoding* using the error handling method defined by *errors*. *errors* may be "NULL" to use the default method defined for the codec. Raises a "LookupError" if no encoder can be found. Codec lookup API ================ In the following functions, the *encoding* string is looked up converted to all lower-case characters, which makes encodings looked up through this mechanism effectively case-insensitive. If no codec is found, a "KeyError" is set and "NULL" returned. PyObject *PyCodec_Encoder(const char *encoding) *Return value: New reference.** Part of the Stable ABI.* Get an encoder function for the given *encoding*. PyObject *PyCodec_Decoder(const char *encoding) *Return value: New reference.** Part of the Stable ABI.* Get a decoder function for the given *encoding*. PyObject *PyCodec_IncrementalEncoder(const char *encoding, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Get an "IncrementalEncoder" object for the given *encoding*. PyObject *PyCodec_IncrementalDecoder(const char *encoding, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Get an "IncrementalDecoder" object for the given *encoding*. PyObject *PyCodec_StreamReader(const char *encoding, PyObject *stream, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Get a "StreamReader" factory function for the given *encoding*. PyObject *PyCodec_StreamWriter(const char *encoding, PyObject *stream, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Get a "StreamWriter" factory function for the given *encoding*. Registry API for Unicode encoding error handlers ================================================ int PyCodec_RegisterError(const char *name, PyObject *error) * Part of the Stable ABI.* Register the error handling callback function *error* under the given *name*. This callback function will be called by a codec when it encounters unencodable characters/undecodable bytes and *name* is specified as the error parameter in the call to the encode/decode function. The callback gets a single argument, an instance of "UnicodeEncodeError", "UnicodeDecodeError" or "UnicodeTranslateError" that holds information about the problematic sequence of characters or bytes and their offset in the original string (see Unicode Exception Objects for functions to extract this information). The callback must either raise the given exception, or return a two-item tuple containing the replacement for the problematic sequence, and an integer giving the offset in the original string at which encoding/decoding should be resumed. Return "0" on success, "-1" on error. PyObject *PyCodec_LookupError(const char *name) *Return value: New reference.** Part of the Stable ABI.* Lookup the error handling callback function registered under *name*. As a special case "NULL" can be passed, in which case the error handling callback for “strict” will be returned. PyObject *PyCodec_StrictErrors(PyObject *exc) *Return value: Always NULL.** Part of the Stable ABI.* Raise *exc* as an exception. PyObject *PyCodec_IgnoreErrors(PyObject *exc) *Return value: New reference.** Part of the Stable ABI.* Ignore the unicode error, skipping the faulty input. PyObject *PyCodec_ReplaceErrors(PyObject *exc) *Return value: New reference.** Part of the Stable ABI.* Replace the unicode encode error with "?" or "U+FFFD". PyObject *PyCodec_XMLCharRefReplaceErrors(PyObject *exc) *Return value: New reference.** Part of the Stable ABI.* Replace the unicode encode error with XML character references. PyObject *PyCodec_BackslashReplaceErrors(PyObject *exc) *Return value: New reference.** Part of the Stable ABI.* Replace the unicode encode error with backslash escapes ("\x", "\u" and "\U"). PyObject *PyCodec_NameReplaceErrors(PyObject *exc) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Replace the unicode encode error with "\N{...}" escapes. Added in version 3.5. Complex Number Objects ********************** Python’s complex number objects are implemented as two distinct types when viewed from the C API: one is the Python object exposed to Python programs, and the other is a C structure which represents the actual complex number value. The API provides functions for working with both. Complex Numbers as C Structures =============================== Note that the functions which accept these structures as parameters and return them as results do so *by value* rather than dereferencing them through pointers. This is consistent throughout the API. type Py_complex The C structure which corresponds to the value portion of a Python complex number object. Most of the functions for dealing with complex number objects use structures of this type as input or output values, as appropriate. double real double imag The structure is defined as: typedef struct { double real; double imag; } Py_complex; Py_complex _Py_c_sum(Py_complex left, Py_complex right) Return the sum of two complex numbers, using the C "Py_complex" representation. Py_complex _Py_c_diff(Py_complex left, Py_complex right) Return the difference between two complex numbers, using the C "Py_complex" representation. Py_complex _Py_c_neg(Py_complex num) Return the negation of the complex number *num*, using the C "Py_complex" representation. Py_complex _Py_c_prod(Py_complex left, Py_complex right) Return the product of two complex numbers, using the C "Py_complex" representation. Py_complex _Py_c_quot(Py_complex dividend, Py_complex divisor) Return the quotient of two complex numbers, using the C "Py_complex" representation. If *divisor* is null, this method returns zero and sets "errno" to "EDOM". Py_complex _Py_c_pow(Py_complex num, Py_complex exp) Return the exponentiation of *num* by *exp*, using the C "Py_complex" representation. If *num* is null and *exp* is not a positive real number, this method returns zero and sets "errno" to "EDOM". Complex Numbers as Python Objects ================================= type PyComplexObject This subtype of "PyObject" represents a Python complex number object. PyTypeObject PyComplex_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python complex number type. It is the same object as "complex" in the Python layer. int PyComplex_Check(PyObject *p) Return true if its argument is a "PyComplexObject" or a subtype of "PyComplexObject". This function always succeeds. int PyComplex_CheckExact(PyObject *p) Return true if its argument is a "PyComplexObject", but not a subtype of "PyComplexObject". This function always succeeds. PyObject *PyComplex_FromCComplex(Py_complex v) *Return value: New reference.* Create a new Python complex number object from a C "Py_complex" value. Return "NULL" with an exception set on error. PyObject *PyComplex_FromDoubles(double real, double imag) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyComplexObject" object from *real* and *imag*. Return "NULL" with an exception set on error. double PyComplex_RealAsDouble(PyObject *op) * Part of the Stable ABI.* Return the real part of *op* as a C double. If *op* is not a Python complex number object but has a "__complex__()" method, this method will first be called to convert *op* to a Python complex number object. If "__complex__()" is not defined then it falls back to call "PyFloat_AsDouble()" and returns its result. Upon failure, this method returns "-1.0" with an exception set, so one should call "PyErr_Occurred()" to check for errors. Changed in version 3.13: Use "__complex__()" if available. double PyComplex_ImagAsDouble(PyObject *op) * Part of the Stable ABI.* Return the imaginary part of *op* as a C double. If *op* is not a Python complex number object but has a "__complex__()" method, this method will first be called to convert *op* to a Python complex number object. If "__complex__()" is not defined then it falls back to call "PyFloat_AsDouble()" and returns "0.0" on success. Upon failure, this method returns "-1.0" with an exception set, so one should call "PyErr_Occurred()" to check for errors. Changed in version 3.13: Use "__complex__()" if available. Py_complex PyComplex_AsCComplex(PyObject *op) Return the "Py_complex" value of the complex number *op*. If *op* is not a Python complex number object but has a "__complex__()" method, this method will first be called to convert *op* to a Python complex number object. If "__complex__()" is not defined then it falls back to "__float__()". If "__float__()" is not defined then it falls back to "__index__()". Upon failure, this method returns "Py_complex" with "real" set to "-1.0" and with an exception set, so one should call "PyErr_Occurred()" to check for errors. Changed in version 3.8: Use "__index__()" if available. Concrete Objects Layer ********************** The functions in this chapter are specific to certain Python object types. Passing them an object of the wrong type is not a good idea; if you receive an object from a Python program and you are not sure that it has the right type, you must perform a type check first; for example, to check that an object is a dictionary, use "PyDict_Check()". The chapter is structured like the “family tree” of Python object types. Warning: While the functions described in this chapter carefully check the type of the objects which are passed in, many of them do not check for "NULL" being passed instead of a valid object. Allowing "NULL" to be passed in can cause memory access violations and immediate termination of the interpreter. Fundamental Objects =================== This section describes Python type objects and the singleton object "None". * Type Objects * Creating Heap-Allocated Types * The "None" Object Numeric Objects =============== * Integer Objects * Boolean Objects * Floating-Point Objects * Pack and Unpack functions * Pack functions * Unpack functions * Complex Number Objects * Complex Numbers as C Structures * Complex Numbers as Python Objects Sequence Objects ================ Generic operations on sequence objects were discussed in the previous chapter; this section deals with the specific kinds of sequence objects that are intrinsic to the Python language. * Bytes Objects * Byte Array Objects * Type check macros * Direct API functions * Macros * Unicode Objects and Codecs * Unicode Objects * Unicode Type * Unicode Character Properties * Creating and accessing Unicode strings * Locale Encoding * File System Encoding * wchar_t Support * Built-in Codecs * Generic Codecs * UTF-8 Codecs * UTF-32 Codecs * UTF-16 Codecs * UTF-7 Codecs * Unicode-Escape Codecs * Raw-Unicode-Escape Codecs * Latin-1 Codecs * ASCII Codecs * Character Map Codecs * MBCS codecs for Windows * Methods and Slot Functions * Tuple Objects * Struct Sequence Objects * List Objects Container Objects ================= * Dictionary Objects * Set Objects Function Objects ================ * Function Objects * Instance Method Objects * Method Objects * Cell Objects * Code Objects * Extra information Other Objects ============= * File Objects * Module Objects * Initializing C modules * Single-phase initialization * Multi-phase initialization * Low-level module creation functions * Support functions * Module lookup * Iterator Objects * Descriptor Objects * Slice Objects * Ellipsis Object * MemoryView objects * Weak Reference Objects * Capsules * Frame Objects * Frame Locals Proxies * Internal Frames * Generator Objects * Coroutine Objects * Context Variables Objects * DateTime Objects * Objects for Type Hinting Context Variables Objects ************************* Added in version 3.7. Changed in version 3.7.1: Note: In Python 3.7.1 the signatures of all context variables C APIs were **changed** to use "PyObject" pointers instead of "PyContext", "PyContextVar", and "PyContextToken", e.g.: // in 3.7.0: PyContext *PyContext_New(void); // in 3.7.1+: PyObject *PyContext_New(void); See bpo-34762 for more details. This section details the public C API for the "contextvars" module. type PyContext The C structure used to represent a "contextvars.Context" object. type PyContextVar The C structure used to represent a "contextvars.ContextVar" object. type PyContextToken The C structure used to represent a "contextvars.Token" object. PyTypeObject PyContext_Type The type object representing the *context* type. PyTypeObject PyContextVar_Type The type object representing the *context variable* type. PyTypeObject PyContextToken_Type The type object representing the *context variable token* type. Type-check macros: int PyContext_CheckExact(PyObject *o) Return true if *o* is of type "PyContext_Type". *o* must not be "NULL". This function always succeeds. int PyContextVar_CheckExact(PyObject *o) Return true if *o* is of type "PyContextVar_Type". *o* must not be "NULL". This function always succeeds. int PyContextToken_CheckExact(PyObject *o) Return true if *o* is of type "PyContextToken_Type". *o* must not be "NULL". This function always succeeds. Context object management functions: PyObject *PyContext_New(void) *Return value: New reference.* Create a new empty context object. Returns "NULL" if an error has occurred. PyObject *PyContext_Copy(PyObject *ctx) *Return value: New reference.* Create a shallow copy of the passed *ctx* context object. Returns "NULL" if an error has occurred. PyObject *PyContext_CopyCurrent(void) *Return value: New reference.* Create a shallow copy of the current thread context. Returns "NULL" if an error has occurred. int PyContext_Enter(PyObject *ctx) Set *ctx* as the current context for the current thread. Returns "0" on success, and "-1" on error. int PyContext_Exit(PyObject *ctx) Deactivate the *ctx* context and restore the previous context as the current context for the current thread. Returns "0" on success, and "-1" on error. Context variable functions: PyObject *PyContextVar_New(const char *name, PyObject *def) *Return value: New reference.* Create a new "ContextVar" object. The *name* parameter is used for introspection and debug purposes. The *def* parameter specifies a default value for the context variable, or "NULL" for no default. If an error has occurred, this function returns "NULL". int PyContextVar_Get(PyObject *var, PyObject *default_value, PyObject **value) Get the value of a context variable. Returns "-1" if an error has occurred during lookup, and "0" if no error occurred, whether or not a value was found. If the context variable was found, *value* will be a pointer to it. If the context variable was *not* found, *value* will point to: * *default_value*, if not "NULL"; * the default value of *var*, if not "NULL"; * "NULL" Except for "NULL", the function returns a new reference. PyObject *PyContextVar_Set(PyObject *var, PyObject *value) *Return value: New reference.* Set the value of *var* to *value* in the current context. Returns a new token object for this change, or "NULL" if an error has occurred. int PyContextVar_Reset(PyObject *var, PyObject *token) Reset the state of the *var* context variable to that it was in before "PyContextVar_Set()" that returned the *token* was called. This function returns "0" on success and "-1" on error. String conversion and formatting ******************************** Functions for number conversion and formatted string output. int PyOS_snprintf(char *str, size_t size, const char *format, ...) * Part of the Stable ABI.* Output not more than *size* bytes to *str* according to the format string *format* and the extra arguments. See the Unix man page *snprintf(3)*. int PyOS_vsnprintf(char *str, size_t size, const char *format, va_list va) * Part of the Stable ABI.* Output not more than *size* bytes to *str* according to the format string *format* and the variable argument list *va*. Unix man page *vsnprintf(3)*. "PyOS_snprintf()" and "PyOS_vsnprintf()" wrap the Standard C library functions "snprintf()" and "vsnprintf()". Their purpose is to guarantee consistent behavior in corner cases, which the Standard C functions do not. The wrappers ensure that "str[size-1]" is always "'\0'" upon return. They never write more than *size* bytes (including the trailing "'\0'") into str. Both functions require that "str != NULL", "size > 0", "format != NULL" and "size < INT_MAX". Note that this means there is no equivalent to the C99 "n = snprintf(NULL, 0, ...)" which would determine the necessary buffer size. The return value (*rv*) for these functions should be interpreted as follows: * When "0 <= rv < size", the output conversion was successful and *rv* characters were written to *str* (excluding the trailing "'\0'" byte at "str[rv]"). * When "rv >= size", the output conversion was truncated and a buffer with "rv + 1" bytes would have been needed to succeed. "str[size-1]" is "'\0'" in this case. * When "rv < 0", “something bad happened.” "str[size-1]" is "'\0'" in this case too, but the rest of *str* is undefined. The exact cause of the error depends on the underlying platform. The following functions provide locale-independent string to number conversions. unsigned long PyOS_strtoul(const char *str, char **ptr, int base) * Part of the Stable ABI.* Convert the initial part of the string in "str" to an unsigned long value according to the given "base", which must be between "2" and "36" inclusive, or be the special value "0". Leading white space and case of characters are ignored. If "base" is zero it looks for a leading "0b", "0o" or "0x" to tell which base. If these are absent it defaults to "10". Base must be 0 or between 2 and 36 (inclusive). If "ptr" is non-"NULL" it will contain a pointer to the end of the scan. If the converted value falls out of range of corresponding return type, range error occurs ("errno" is set to "ERANGE") and "ULONG_MAX" is returned. If no conversion can be performed, "0" is returned. See also the Unix man page *strtoul(3)*. Added in version 3.2. long PyOS_strtol(const char *str, char **ptr, int base) * Part of the Stable ABI.* Convert the initial part of the string in "str" to an long value according to the given "base", which must be between "2" and "36" inclusive, or be the special value "0". Same as "PyOS_strtoul()", but return a long value instead and "LONG_MAX" on overflows. See also the Unix man page *strtol(3)*. Added in version 3.2. double PyOS_string_to_double(const char *s, char **endptr, PyObject *overflow_exception) * Part of the Stable ABI.* Convert a string "s" to a double, raising a Python exception on failure. The set of accepted strings corresponds to the set of strings accepted by Python’s "float()" constructor, except that "s" must not have leading or trailing whitespace. The conversion is independent of the current locale. If "endptr" is "NULL", convert the whole string. Raise "ValueError" and return "-1.0" if the string is not a valid representation of a floating-point number. If endptr is not "NULL", convert as much of the string as possible and set "*endptr" to point to the first unconverted character. If no initial segment of the string is the valid representation of a floating-point number, set "*endptr" to point to the beginning of the string, raise ValueError, and return "-1.0". If "s" represents a value that is too large to store in a float (for example, ""1e500"" is such a string on many platforms) then if "overflow_exception" is "NULL" return "Py_HUGE_VAL" (with an appropriate sign) and don’t set any exception. Otherwise, "overflow_exception" must point to a Python exception object; raise that exception and return "-1.0". In both cases, set "*endptr" to point to the first character after the converted value. If any other error occurs during the conversion (for example an out-of-memory error), set the appropriate Python exception and return "-1.0". Added in version 3.1. char *PyOS_double_to_string(double val, char format_code, int precision, int flags, int *ptype) * Part of the Stable ABI.* Convert a double *val* to a string using supplied *format_code*, *precision*, and *flags*. *format_code* must be one of "'e'", "'E'", "'f'", "'F'", "'g'", "'G'" or "'r'". For "'r'", the supplied *precision* must be 0 and is ignored. The "'r'" format code specifies the standard "repr()" format. *flags* can be zero or more of the values "Py_DTSF_SIGN", "Py_DTSF_ADD_DOT_0", or "Py_DTSF_ALT", or-ed together: * "Py_DTSF_SIGN" means to always precede the returned string with a sign character, even if *val* is non-negative. * "Py_DTSF_ADD_DOT_0" means to ensure that the returned string will not look like an integer. * "Py_DTSF_ALT" means to apply “alternate” formatting rules. See the documentation for the "PyOS_snprintf()" "'#'" specifier for details. If *ptype* is non-"NULL", then the value it points to will be set to one of "Py_DTST_FINITE", "Py_DTST_INFINITE", or "Py_DTST_NAN", signifying that *val* is a finite number, an infinite number, or not a number, respectively. The return value is a pointer to *buffer* with the converted string or "NULL" if the conversion failed. The caller is responsible for freeing the returned string by calling "PyMem_Free()". Added in version 3.1. int PyOS_stricmp(const char *s1, const char *s2) Case insensitive comparison of strings. The function works almost identically to "strcmp()" except that it ignores the case. int PyOS_strnicmp(const char *s1, const char *s2, Py_ssize_t size) Case insensitive comparison of strings. The function works almost identically to "strncmp()" except that it ignores the case. Coroutine Objects ***************** Added in version 3.5. Coroutine objects are what functions declared with an "async" keyword return. type PyCoroObject The C structure used for coroutine objects. PyTypeObject PyCoro_Type The type object corresponding to coroutine objects. int PyCoro_CheckExact(PyObject *ob) Return true if *ob*’s type is "PyCoro_Type"; *ob* must not be "NULL". This function always succeeds. PyObject *PyCoro_New(PyFrameObject *frame, PyObject *name, PyObject *qualname) *Return value: New reference.* Create and return a new coroutine object based on the *frame* object, with "__name__" and "__qualname__" set to *name* and *qualname*. A reference to *frame* is stolen by this function. The *frame* argument must not be "NULL". DateTime Objects **************** Various date and time objects are supplied by the "datetime" module. Before using any of these functions, the header file "datetime.h" must be included in your source (note that this is not included by "Python.h"), and the macro "PyDateTime_IMPORT" must be invoked, usually as part of the module initialisation function. The macro puts a pointer to a C structure into a static variable, "PyDateTimeAPI", that is used by the following macros. type PyDateTime_Date This subtype of "PyObject" represents a Python date object. type PyDateTime_DateTime This subtype of "PyObject" represents a Python datetime object. type PyDateTime_Time This subtype of "PyObject" represents a Python time object. type PyDateTime_Delta This subtype of "PyObject" represents the difference between two datetime values. PyTypeObject PyDateTime_DateType This instance of "PyTypeObject" represents the Python date type; it is the same object as "datetime.date" in the Python layer. PyTypeObject PyDateTime_DateTimeType This instance of "PyTypeObject" represents the Python datetime type; it is the same object as "datetime.datetime" in the Python layer. PyTypeObject PyDateTime_TimeType This instance of "PyTypeObject" represents the Python time type; it is the same object as "datetime.time" in the Python layer. PyTypeObject PyDateTime_DeltaType This instance of "PyTypeObject" represents Python type for the difference between two datetime values; it is the same object as "datetime.timedelta" in the Python layer. PyTypeObject PyDateTime_TZInfoType This instance of "PyTypeObject" represents the Python time zone info type; it is the same object as "datetime.tzinfo" in the Python layer. Macro for access to the UTC singleton: PyObject *PyDateTime_TimeZone_UTC Returns the time zone singleton representing UTC, the same object as "datetime.timezone.utc". Added in version 3.7. Type-check macros: int PyDate_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateType" or a subtype of "PyDateTime_DateType". *ob* must not be "NULL". This function always succeeds. int PyDate_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateType". *ob* must not be "NULL". This function always succeeds. int PyDateTime_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateTimeType" or a subtype of "PyDateTime_DateTimeType". *ob* must not be "NULL". This function always succeeds. int PyDateTime_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateTimeType". *ob* must not be "NULL". This function always succeeds. int PyTime_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_TimeType" or a subtype of "PyDateTime_TimeType". *ob* must not be "NULL". This function always succeeds. int PyTime_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_TimeType". *ob* must not be "NULL". This function always succeeds. int PyDelta_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_DeltaType" or a subtype of "PyDateTime_DeltaType". *ob* must not be "NULL". This function always succeeds. int PyDelta_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_DeltaType". *ob* must not be "NULL". This function always succeeds. int PyTZInfo_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_TZInfoType" or a subtype of "PyDateTime_TZInfoType". *ob* must not be "NULL". This function always succeeds. int PyTZInfo_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_TZInfoType". *ob* must not be "NULL". This function always succeeds. Macros to create objects: PyObject *PyDate_FromDate(int year, int month, int day) *Return value: New reference.* Return a "datetime.date" object with the specified year, month and day. PyObject *PyDateTime_FromDateAndTime(int year, int month, int day, int hour, int minute, int second, int usecond) *Return value: New reference.* Return a "datetime.datetime" object with the specified year, month, day, hour, minute, second and microsecond. PyObject *PyDateTime_FromDateAndTimeAndFold(int year, int month, int day, int hour, int minute, int second, int usecond, int fold) *Return value: New reference.* Return a "datetime.datetime" object with the specified year, month, day, hour, minute, second, microsecond and fold. Added in version 3.6. PyObject *PyTime_FromTime(int hour, int minute, int second, int usecond) *Return value: New reference.* Return a "datetime.time" object with the specified hour, minute, second and microsecond. PyObject *PyTime_FromTimeAndFold(int hour, int minute, int second, int usecond, int fold) *Return value: New reference.* Return a "datetime.time" object with the specified hour, minute, second, microsecond and fold. Added in version 3.6. PyObject *PyDelta_FromDSU(int days, int seconds, int useconds) *Return value: New reference.* Return a "datetime.timedelta" object representing the given number of days, seconds and microseconds. Normalization is performed so that the resulting number of microseconds and seconds lie in the ranges documented for "datetime.timedelta" objects. PyObject *PyTimeZone_FromOffset(PyObject *offset) *Return value: New reference.* Return a "datetime.timezone" object with an unnamed fixed offset represented by the *offset* argument. Added in version 3.7. PyObject *PyTimeZone_FromOffsetAndName(PyObject *offset, PyObject *name) *Return value: New reference.* Return a "datetime.timezone" object with a fixed offset represented by the *offset* argument and with tzname *name*. Added in version 3.7. Macros to extract fields from date objects. The argument must be an instance of "PyDateTime_Date", including subclasses (such as "PyDateTime_DateTime"). The argument must not be "NULL", and the type is not checked: int PyDateTime_GET_YEAR(PyDateTime_Date *o) Return the year, as a positive int. int PyDateTime_GET_MONTH(PyDateTime_Date *o) Return the month, as an int from 1 through 12. int PyDateTime_GET_DAY(PyDateTime_Date *o) Return the day, as an int from 1 through 31. Macros to extract fields from datetime objects. The argument must be an instance of "PyDateTime_DateTime", including subclasses. The argument must not be "NULL", and the type is not checked: int PyDateTime_DATE_GET_HOUR(PyDateTime_DateTime *o) Return the hour, as an int from 0 through 23. int PyDateTime_DATE_GET_MINUTE(PyDateTime_DateTime *o) Return the minute, as an int from 0 through 59. int PyDateTime_DATE_GET_SECOND(PyDateTime_DateTime *o) Return the second, as an int from 0 through 59. int PyDateTime_DATE_GET_MICROSECOND(PyDateTime_DateTime *o) Return the microsecond, as an int from 0 through 999999. int PyDateTime_DATE_GET_FOLD(PyDateTime_DateTime *o) Return the fold, as an int from 0 through 1. Added in version 3.6. PyObject *PyDateTime_DATE_GET_TZINFO(PyDateTime_DateTime *o) Return the tzinfo (which may be "None"). Added in version 3.10. Macros to extract fields from time objects. The argument must be an instance of "PyDateTime_Time", including subclasses. The argument must not be "NULL", and the type is not checked: int PyDateTime_TIME_GET_HOUR(PyDateTime_Time *o) Return the hour, as an int from 0 through 23. int PyDateTime_TIME_GET_MINUTE(PyDateTime_Time *o) Return the minute, as an int from 0 through 59. int PyDateTime_TIME_GET_SECOND(PyDateTime_Time *o) Return the second, as an int from 0 through 59. int PyDateTime_TIME_GET_MICROSECOND(PyDateTime_Time *o) Return the microsecond, as an int from 0 through 999999. int PyDateTime_TIME_GET_FOLD(PyDateTime_Time *o) Return the fold, as an int from 0 through 1. Added in version 3.6. PyObject *PyDateTime_TIME_GET_TZINFO(PyDateTime_Time *o) Return the tzinfo (which may be "None"). Added in version 3.10. Macros to extract fields from time delta objects. The argument must be an instance of "PyDateTime_Delta", including subclasses. The argument must not be "NULL", and the type is not checked: int PyDateTime_DELTA_GET_DAYS(PyDateTime_Delta *o) Return the number of days, as an int from -999999999 to 999999999. Added in version 3.3. int PyDateTime_DELTA_GET_SECONDS(PyDateTime_Delta *o) Return the number of seconds, as an int from 0 through 86399. Added in version 3.3. int PyDateTime_DELTA_GET_MICROSECONDS(PyDateTime_Delta *o) Return the number of microseconds, as an int from 0 through 999999. Added in version 3.3. Macros for the convenience of modules implementing the DB API: PyObject *PyDateTime_FromTimestamp(PyObject *args) *Return value: New reference.* Create and return a new "datetime.datetime" object given an argument tuple suitable for passing to "datetime.datetime.fromtimestamp()". PyObject *PyDate_FromTimestamp(PyObject *args) *Return value: New reference.* Create and return a new "datetime.date" object given an argument tuple suitable for passing to "datetime.date.fromtimestamp()". Descriptor Objects ****************** “Descriptors” are objects that describe some attribute of an object. They are found in the dictionary of type objects. PyTypeObject PyProperty_Type * Part of the Stable ABI.* The type object for the built-in descriptor types. PyObject *PyDescr_NewGetSet(PyTypeObject *type, struct PyGetSetDef *getset) *Return value: New reference.** Part of the Stable ABI.* PyObject *PyDescr_NewMember(PyTypeObject *type, struct PyMemberDef *meth) *Return value: New reference.** Part of the Stable ABI.* PyObject *PyDescr_NewMethod(PyTypeObject *type, struct PyMethodDef *meth) *Return value: New reference.** Part of the Stable ABI.* PyObject *PyDescr_NewWrapper(PyTypeObject *type, struct wrapperbase *wrapper, void *wrapped) *Return value: New reference.* PyObject *PyDescr_NewClassMethod(PyTypeObject *type, PyMethodDef *method) *Return value: New reference.** Part of the Stable ABI.* int PyDescr_IsData(PyObject *descr) Return non-zero if the descriptor objects *descr* describes a data attribute, or "0" if it describes a method. *descr* must be a descriptor object; there is no error checking. PyObject *PyWrapper_New(PyObject*, PyObject*) *Return value: New reference.** Part of the Stable ABI.* Dictionary Objects ****************** type PyDictObject This subtype of "PyObject" represents a Python dictionary object. PyTypeObject PyDict_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python dictionary type. This is the same object as "dict" in the Python layer. int PyDict_Check(PyObject *p) Return true if *p* is a dict object or an instance of a subtype of the dict type. This function always succeeds. int PyDict_CheckExact(PyObject *p) Return true if *p* is a dict object, but not an instance of a subtype of the dict type. This function always succeeds. PyObject *PyDict_New() *Return value: New reference.** Part of the Stable ABI.* Return a new empty dictionary, or "NULL" on failure. PyObject *PyDictProxy_New(PyObject *mapping) *Return value: New reference.** Part of the Stable ABI.* Return a "types.MappingProxyType" object for a mapping which enforces read-only behavior. This is normally used to create a view to prevent modification of the dictionary for non-dynamic class types. void PyDict_Clear(PyObject *p) * Part of the Stable ABI.* Empty an existing dictionary of all key-value pairs. int PyDict_Contains(PyObject *p, PyObject *key) * Part of the Stable ABI.* Determine if dictionary *p* contains *key*. If an item in *p* is matches *key*, return "1", otherwise return "0". On error, return "-1". This is equivalent to the Python expression "key in p". int PyDict_ContainsString(PyObject *p, const char *key) This is the same as "PyDict_Contains()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Added in version 3.13. PyObject *PyDict_Copy(PyObject *p) *Return value: New reference.** Part of the Stable ABI.* Return a new dictionary that contains the same key-value pairs as *p*. int PyDict_SetItem(PyObject *p, PyObject *key, PyObject *val) * Part of the Stable ABI.* Insert *val* into the dictionary *p* with a key of *key*. *key* must be *hashable*; if it isn’t, "TypeError" will be raised. Return "0" on success or "-1" on failure. This function *does not* steal a reference to *val*. int PyDict_SetItemString(PyObject *p, const char *key, PyObject *val) * Part of the Stable ABI.* This is the same as "PyDict_SetItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. int PyDict_DelItem(PyObject *p, PyObject *key) * Part of the Stable ABI.* Remove the entry in dictionary *p* with key *key*. *key* must be *hashable*; if it isn’t, "TypeError" is raised. If *key* is not in the dictionary, "KeyError" is raised. Return "0" on success or "-1" on failure. int PyDict_DelItemString(PyObject *p, const char *key) * Part of the Stable ABI.* This is the same as "PyDict_DelItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. int PyDict_GetItemRef(PyObject *p, PyObject *key, PyObject **result) * Part of the Stable ABI since version 3.13.* Return a new *strong reference* to the object from dictionary *p* which has a key *key*: * If the key is present, set **result* to a new *strong reference* to the value and return "1". * If the key is missing, set **result* to "NULL" and return "0". * On error, raise an exception and return "-1". Added in version 3.13. See also the "PyObject_GetItem()" function. PyObject *PyDict_GetItem(PyObject *p, PyObject *key) *Return value: Borrowed reference.** Part of the Stable ABI.* Return a *borrowed reference* to the object from dictionary *p* which has a key *key*. Return "NULL" if the key *key* is missing *without* setting an exception. Note: Exceptions that occur while this calls "__hash__()" and "__eq__()" methods are silently ignored. Prefer the "PyDict_GetItemWithError()" function instead. Changed in version 3.10: Calling this API without *GIL* held had been allowed for historical reason. It is no longer allowed. PyObject *PyDict_GetItemWithError(PyObject *p, PyObject *key) *Return value: Borrowed reference.** Part of the Stable ABI.* Variant of "PyDict_GetItem()" that does not suppress exceptions. Return "NULL" **with** an exception set if an exception occurred. Return "NULL" **without** an exception set if the key wasn’t present. PyObject *PyDict_GetItemString(PyObject *p, const char *key) *Return value: Borrowed reference.** Part of the Stable ABI.* This is the same as "PyDict_GetItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Note: Exceptions that occur while this calls "__hash__()" and "__eq__()" methods or while creating the temporary "str" object are silently ignored. Prefer using the "PyDict_GetItemWithError()" function with your own "PyUnicode_FromString()" *key* instead. int PyDict_GetItemStringRef(PyObject *p, const char *key, PyObject **result) * Part of the Stable ABI since version 3.13.* Similar to "PyDict_GetItemRef()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Added in version 3.13. PyObject *PyDict_SetDefault(PyObject *p, PyObject *key, PyObject *defaultobj) *Return value: Borrowed reference.* This is the same as the Python-level "dict.setdefault()". If present, it returns the value corresponding to *key* from the dictionary *p*. If the key is not in the dict, it is inserted with value *defaultobj* and *defaultobj* is returned. This function evaluates the hash function of *key* only once, instead of evaluating it independently for the lookup and the insertion. Added in version 3.4. int PyDict_SetDefaultRef(PyObject *p, PyObject *key, PyObject *default_value, PyObject **result) Inserts *default_value* into the dictionary *p* with a key of *key* if the key is not already present in the dictionary. If *result* is not "NULL", then **result* is set to a *strong reference* to either *default_value*, if the key was not present, or the existing value, if *key* was already present in the dictionary. Returns "1" if the key was present and *default_value* was not inserted, or "0" if the key was not present and *default_value* was inserted. On failure, returns "-1", sets an exception, and sets "*result" to "NULL". For clarity: if you have a strong reference to *default_value* before calling this function, then after it returns, you hold a strong reference to both *default_value* and **result* (if it’s not "NULL"). These may refer to the same object: in that case you hold two separate references to it. Added in version 3.13. int PyDict_Pop(PyObject *p, PyObject *key, PyObject **result) Remove *key* from dictionary *p* and optionally return the removed value. Do not raise "KeyError" if the key missing. * If the key is present, set **result* to a new reference to the removed value if *result* is not "NULL", and return "1". * If the key is missing, set **result* to "NULL" if *result* is not "NULL", and return "0". * On error, raise an exception and return "-1". Similar to "dict.pop()", but without the default value and not raising "KeyError" if the key missing. Added in version 3.13. int PyDict_PopString(PyObject *p, const char *key, PyObject **result) Similar to "PyDict_Pop()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Added in version 3.13. PyObject *PyDict_Items(PyObject *p) *Return value: New reference.** Part of the Stable ABI.* Return a "PyListObject" containing all the items from the dictionary. PyObject *PyDict_Keys(PyObject *p) *Return value: New reference.** Part of the Stable ABI.* Return a "PyListObject" containing all the keys from the dictionary. PyObject *PyDict_Values(PyObject *p) *Return value: New reference.** Part of the Stable ABI.* Return a "PyListObject" containing all the values from the dictionary *p*. Py_ssize_t PyDict_Size(PyObject *p) * Part of the Stable ABI.* Return the number of items in the dictionary. This is equivalent to "len(p)" on a dictionary. int PyDict_Next(PyObject *p, Py_ssize_t *ppos, PyObject **pkey, PyObject **pvalue) * Part of the Stable ABI.* Iterate over all key-value pairs in the dictionary *p*. The "Py_ssize_t" referred to by *ppos* must be initialized to "0" prior to the first call to this function to start the iteration; the function returns true for each pair in the dictionary, and false once all pairs have been reported. The parameters *pkey* and *pvalue* should either point to PyObject* variables that will be filled in with each key and value, respectively, or may be "NULL". Any references returned through them are borrowed. *ppos* should not be altered during iteration. Its value represents offsets within the internal dictionary structure, and since the structure is sparse, the offsets are not consecutive. For example: PyObject *key, *value; Py_ssize_t pos = 0; while (PyDict_Next(self->dict, &pos, &key, &value)) { /* do something interesting with the values... */ ... } The dictionary *p* should not be mutated during iteration. It is safe to modify the values of the keys as you iterate over the dictionary, but only so long as the set of keys does not change. For example: PyObject *key, *value; Py_ssize_t pos = 0; while (PyDict_Next(self->dict, &pos, &key, &value)) { long i = PyLong_AsLong(value); if (i == -1 && PyErr_Occurred()) { return -1; } PyObject *o = PyLong_FromLong(i + 1); if (o == NULL) return -1; if (PyDict_SetItem(self->dict, key, o) < 0) { Py_DECREF(o); return -1; } Py_DECREF(o); } The function is not thread-safe in the *free-threaded* build without external synchronization. You can use "Py_BEGIN_CRITICAL_SECTION" to lock the dictionary while iterating over it: Py_BEGIN_CRITICAL_SECTION(self->dict); while (PyDict_Next(self->dict, &pos, &key, &value)) { ... } Py_END_CRITICAL_SECTION(); int PyDict_Merge(PyObject *a, PyObject *b, int override) * Part of the Stable ABI.* Iterate over mapping object *b* adding key-value pairs to dictionary *a*. *b* may be a dictionary, or any object supporting "PyMapping_Keys()" and "PyObject_GetItem()". If *override* is true, existing pairs in *a* will be replaced if a matching key is found in *b*, otherwise pairs will only be added if there is not a matching key in *a*. Return "0" on success or "-1" if an exception was raised. int PyDict_Update(PyObject *a, PyObject *b) * Part of the Stable ABI.* This is the same as "PyDict_Merge(a, b, 1)" in C, and is similar to "a.update(b)" in Python except that "PyDict_Update()" doesn’t fall back to the iterating over a sequence of key value pairs if the second argument has no “keys” attribute. Return "0" on success or "-1" if an exception was raised. int PyDict_MergeFromSeq2(PyObject *a, PyObject *seq2, int override) * Part of the Stable ABI.* Update or merge into dictionary *a*, from the key-value pairs in *seq2*. *seq2* must be an iterable object producing iterable objects of length 2, viewed as key-value pairs. In case of duplicate keys, the last wins if *override* is true, else the first wins. Return "0" on success or "-1" if an exception was raised. Equivalent Python (except for the return value): def PyDict_MergeFromSeq2(a, seq2, override): for key, value in seq2: if override or key not in a: a[key] = value int PyDict_AddWatcher(PyDict_WatchCallback callback) Register *callback* as a dictionary watcher. Return a non-negative integer id which must be passed to future calls to "PyDict_Watch()". In case of error (e.g. no more watcher IDs available), return "-1" and set an exception. Added in version 3.12. int PyDict_ClearWatcher(int watcher_id) Clear watcher identified by *watcher_id* previously returned from "PyDict_AddWatcher()". Return "0" on success, "-1" on error (e.g. if the given *watcher_id* was never registered.) Added in version 3.12. int PyDict_Watch(int watcher_id, PyObject *dict) Mark dictionary *dict* as watched. The callback granted *watcher_id* by "PyDict_AddWatcher()" will be called when *dict* is modified or deallocated. Return "0" on success or "-1" on error. Added in version 3.12. int PyDict_Unwatch(int watcher_id, PyObject *dict) Mark dictionary *dict* as no longer watched. The callback granted *watcher_id* by "PyDict_AddWatcher()" will no longer be called when *dict* is modified or deallocated. The dict must previously have been watched by this watcher. Return "0" on success or "-1" on error. Added in version 3.12. type PyDict_WatchEvent Enumeration of possible dictionary watcher events: "PyDict_EVENT_ADDED", "PyDict_EVENT_MODIFIED", "PyDict_EVENT_DELETED", "PyDict_EVENT_CLONED", "PyDict_EVENT_CLEARED", or "PyDict_EVENT_DEALLOCATED". Added in version 3.12. typedef int (*PyDict_WatchCallback)(PyDict_WatchEvent event, PyObject *dict, PyObject *key, PyObject *new_value) Type of a dict watcher callback function. If *event* is "PyDict_EVENT_CLEARED" or "PyDict_EVENT_DEALLOCATED", both *key* and *new_value* will be "NULL". If *event* is "PyDict_EVENT_ADDED" or "PyDict_EVENT_MODIFIED", *new_value* will be the new value for *key*. If *event* is "PyDict_EVENT_DELETED", *key* is being deleted from the dictionary and *new_value* will be "NULL". "PyDict_EVENT_CLONED" occurs when *dict* was previously empty and another dict is merged into it. To maintain efficiency of this operation, per-key "PyDict_EVENT_ADDED" events are not issued in this case; instead a single "PyDict_EVENT_CLONED" is issued, and *key* will be the source dictionary. The callback may inspect but must not modify *dict*; doing so could have unpredictable effects, including infinite recursion. Do not trigger Python code execution in the callback, as it could modify the dict as a side effect. If *event* is "PyDict_EVENT_DEALLOCATED", taking a new reference in the callback to the about-to-be-destroyed dictionary will resurrect it and prevent it from being freed at this time. When the resurrected object is destroyed later, any watcher callbacks active at that time will be called again. Callbacks occur before the notified modification to *dict* takes place, so the prior state of *dict* can be inspected. If the callback sets an exception, it must return "-1"; this exception will be printed as an unraisable exception using "PyErr_WriteUnraisable()". Otherwise it should return "0". There may already be a pending exception set on entry to the callback. In this case, the callback should return "0" with the same exception still set. This means the callback may not call any other API that can set an exception unless it saves and clears the exception state first, and restores it before returning. Added in version 3.12. Exception Handling ****************** The functions described in this chapter will let you handle and raise Python exceptions. It is important to understand some of the basics of Python exception handling. It works somewhat like the POSIX "errno" variable: there is a global indicator (per thread) of the last error that occurred. Most C API functions don’t clear this on success, but will set it to indicate the cause of the error on failure. Most C API functions also return an error indicator, usually "NULL" if they are supposed to return a pointer, or "-1" if they return an integer (exception: the "PyArg_*" functions return "1" for success and "0" for failure). Concretely, the error indicator consists of three object pointers: the exception’s type, the exception’s value, and the traceback object. Any of those pointers can be "NULL" if non-set (although some combinations are forbidden, for example you can’t have a non-"NULL" traceback if the exception type is "NULL"). When a function must fail because some function it called failed, it generally doesn’t set the error indicator; the function it called already set it. It is responsible for either handling the error and clearing the exception or returning after cleaning up any resources it holds (such as object references or memory allocations); it should *not* continue normally if it is not prepared to handle the error. If returning due to an error, it is important to indicate to the caller that an error has been set. If the error is not handled or carefully propagated, additional calls into the Python/C API may not behave as intended and may fail in mysterious ways. Note: The error indicator is **not** the result of "sys.exc_info()". The former corresponds to an exception that is not yet caught (and is therefore still propagating), while the latter returns an exception after it is caught (and has therefore stopped propagating). Printing and clearing ===================== void PyErr_Clear() * Part of the Stable ABI.* Clear the error indicator. If the error indicator is not set, there is no effect. void PyErr_PrintEx(int set_sys_last_vars) * Part of the Stable ABI.* Print a standard traceback to "sys.stderr" and clear the error indicator. **Unless** the error is a "SystemExit", in that case no traceback is printed and the Python process will exit with the error code specified by the "SystemExit" instance. Call this function **only** when the error indicator is set. Otherwise it will cause a fatal error! If *set_sys_last_vars* is nonzero, the variable "sys.last_exc" is set to the printed exception. For backwards compatibility, the deprecated variables "sys.last_type", "sys.last_value" and "sys.last_traceback" are also set to the type, value and traceback of this exception, respectively. Changed in version 3.12: The setting of "sys.last_exc" was added. void PyErr_Print() * Part of the Stable ABI.* Alias for "PyErr_PrintEx(1)". void PyErr_WriteUnraisable(PyObject *obj) * Part of the Stable ABI.* Call "sys.unraisablehook()" using the current exception and *obj* argument. This utility function prints a warning message to "sys.stderr" when an exception has been set but it is impossible for the interpreter to actually raise the exception. It is used, for example, when an exception occurs in an "__del__()" method. The function is called with a single argument *obj* that identifies the context in which the unraisable exception occurred. If possible, the repr of *obj* will be printed in the warning message. If *obj* is "NULL", only the traceback is printed. An exception must be set when calling this function. Changed in version 3.4: Print a traceback. Print only traceback if *obj* is "NULL". Changed in version 3.8: Use "sys.unraisablehook()". void PyErr_FormatUnraisable(const char *format, ...) Similar to "PyErr_WriteUnraisable()", but the *format* and subsequent parameters help format the warning message; they have the same meaning and values as in "PyUnicode_FromFormat()". "PyErr_WriteUnraisable(obj)" is roughly equivalent to "PyErr_FormatUnraisable("Exception ignored in: %R", obj)". If *format* is "NULL", only the traceback is printed. Added in version 3.13. void PyErr_DisplayException(PyObject *exc) * Part of the Stable ABI since version 3.12.* Print the standard traceback display of "exc" to "sys.stderr", including chained exceptions and notes. Added in version 3.12. Raising exceptions ================== These functions help you set the current thread’s error indicator. For convenience, some of these functions will always return a "NULL" pointer for use in a "return" statement. void PyErr_SetString(PyObject *type, const char *message) * Part of the Stable ABI.* This is the most common way to set the error indicator. The first argument specifies the exception type; it is normally one of the standard exceptions, e.g. "PyExc_RuntimeError". You need not create a new *strong reference* to it (e.g. with "Py_INCREF()"). The second argument is an error message; it is decoded from "'utf-8'". void PyErr_SetObject(PyObject *type, PyObject *value) * Part of the Stable ABI.* This function is similar to "PyErr_SetString()" but lets you specify an arbitrary Python object for the “value” of the exception. PyObject *PyErr_Format(PyObject *exception, const char *format, ...) *Return value: Always NULL.** Part of the Stable ABI.* This function sets the error indicator and returns "NULL". *exception* should be a Python exception class. The *format* and subsequent parameters help format the error message; they have the same meaning and values as in "PyUnicode_FromFormat()". *format* is an ASCII-encoded string. PyObject *PyErr_FormatV(PyObject *exception, const char *format, va_list vargs) *Return value: Always NULL.** Part of the Stable ABI since version 3.5.* Same as "PyErr_Format()", but taking a "va_list" argument rather than a variable number of arguments. Added in version 3.5. void PyErr_SetNone(PyObject *type) * Part of the Stable ABI.* This is a shorthand for "PyErr_SetObject(type, Py_None)". int PyErr_BadArgument() * Part of the Stable ABI.* This is a shorthand for "PyErr_SetString(PyExc_TypeError, message)", where *message* indicates that a built-in operation was invoked with an illegal argument. It is mostly for internal use. PyObject *PyErr_NoMemory() *Return value: Always NULL.** Part of the Stable ABI.* This is a shorthand for "PyErr_SetNone(PyExc_MemoryError)"; it returns "NULL" so an object allocation function can write "return PyErr_NoMemory();" when it runs out of memory. PyObject *PyErr_SetFromErrno(PyObject *type) *Return value: Always NULL.** Part of the Stable ABI.* This is a convenience function to raise an exception when a C library function has returned an error and set the C variable "errno". It constructs a tuple object whose first item is the integer "errno" value and whose second item is the corresponding error message (gotten from "strerror()"), and then calls "PyErr_SetObject(type, object)". On Unix, when the "errno" value is "EINTR", indicating an interrupted system call, this calls "PyErr_CheckSignals()", and if that set the error indicator, leaves it set to that. The function always returns "NULL", so a wrapper function around a system call can write "return PyErr_SetFromErrno(type);" when the system call returns an error. PyObject *PyErr_SetFromErrnoWithFilenameObject(PyObject *type, PyObject *filenameObject) *Return value: Always NULL.** Part of the Stable ABI.* Similar to "PyErr_SetFromErrno()", with the additional behavior that if *filenameObject* is not "NULL", it is passed to the constructor of *type* as a third parameter. In the case of "OSError" exception, this is used to define the "filename" attribute of the exception instance. PyObject *PyErr_SetFromErrnoWithFilenameObjects(PyObject *type, PyObject *filenameObject, PyObject *filenameObject2) *Return value: Always NULL.** Part of the Stable ABI since version 3.7.* Similar to "PyErr_SetFromErrnoWithFilenameObject()", but takes a second filename object, for raising errors when a function that takes two filenames fails. Added in version 3.4. PyObject *PyErr_SetFromErrnoWithFilename(PyObject *type, const char *filename) *Return value: Always NULL.** Part of the Stable ABI.* Similar to "PyErr_SetFromErrnoWithFilenameObject()", but the filename is given as a C string. *filename* is decoded from the *filesystem encoding and error handler*. PyObject *PyErr_SetFromWindowsErr(int ierr) *Return value: Always NULL.** Part of the Stable ABI on Windows since version 3.7.* This is a convenience function to raise "OSError". If called with *ierr* of "0", the error code returned by a call to "GetLastError()" is used instead. It calls the Win32 function "FormatMessage()" to retrieve the Windows description of error code given by *ierr* or "GetLastError()", then it constructs a "OSError" object with the "winerror" attribute set to the error code, the "strerror" attribute set to the corresponding error message (gotten from "FormatMessage()"), and then calls "PyErr_SetObject(PyExc_OSError, object)". This function always returns "NULL". Availability: Windows. PyObject *PyErr_SetExcFromWindowsErr(PyObject *type, int ierr) *Return value: Always NULL.** Part of the Stable ABI on Windows since version 3.7.* Similar to "PyErr_SetFromWindowsErr()", with an additional parameter specifying the exception type to be raised. Availability: Windows. PyObject *PyErr_SetFromWindowsErrWithFilename(int ierr, const char *filename) *Return value: Always NULL.** Part of the Stable ABI on Windows since version 3.7.* Similar to "PyErr_SetFromWindowsErr()", with the additional behavior that if *filename* is not "NULL", it is decoded from the filesystem encoding ("os.fsdecode()") and passed to the constructor of "OSError" as a third parameter to be used to define the "filename" attribute of the exception instance. Availability: Windows. PyObject *PyErr_SetExcFromWindowsErrWithFilenameObject(PyObject *type, int ierr, PyObject *filename) *Return value: Always NULL.** Part of the Stable ABI on Windows since version 3.7.* Similar to "PyErr_SetExcFromWindowsErr()", with the additional behavior that if *filename* is not "NULL", it is passed to the constructor of "OSError" as a third parameter to be used to define the "filename" attribute of the exception instance. Availability: Windows. PyObject *PyErr_SetExcFromWindowsErrWithFilenameObjects(PyObject *type, int ierr, PyObject *filename, PyObject *filename2) *Return value: Always NULL.** Part of the Stable ABI on Windows since version 3.7.* Similar to "PyErr_SetExcFromWindowsErrWithFilenameObject()", but accepts a second filename object. Availability: Windows. Added in version 3.4. PyObject *PyErr_SetExcFromWindowsErrWithFilename(PyObject *type, int ierr, const char *filename) *Return value: Always NULL.** Part of the Stable ABI on Windows since version 3.7.* Similar to "PyErr_SetFromWindowsErrWithFilename()", with an additional parameter specifying the exception type to be raised. Availability: Windows. PyObject *PyErr_SetImportError(PyObject *msg, PyObject *name, PyObject *path) *Return value: Always NULL.** Part of the Stable ABI since version 3.7.* This is a convenience function to raise "ImportError". *msg* will be set as the exception’s message string. *name* and *path*, both of which can be "NULL", will be set as the "ImportError"’s respective "name" and "path" attributes. Added in version 3.3. PyObject *PyErr_SetImportErrorSubclass(PyObject *exception, PyObject *msg, PyObject *name, PyObject *path) *Return value: Always NULL.** Part of the Stable ABI since version 3.6.* Much like "PyErr_SetImportError()" but this function allows for specifying a subclass of "ImportError" to raise. Added in version 3.6. void PyErr_SyntaxLocationObject(PyObject *filename, int lineno, int col_offset) Set file, line, and offset information for the current exception. If the current exception is not a "SyntaxError", then it sets additional attributes, which make the exception printing subsystem think the exception is a "SyntaxError". Added in version 3.4. void PyErr_SyntaxLocationEx(const char *filename, int lineno, int col_offset) * Part of the Stable ABI since version 3.7.* Like "PyErr_SyntaxLocationObject()", but *filename* is a byte string decoded from the *filesystem encoding and error handler*. Added in version 3.2. void PyErr_SyntaxLocation(const char *filename, int lineno) * Part of the Stable ABI.* Like "PyErr_SyntaxLocationEx()", but the *col_offset* parameter is omitted. void PyErr_BadInternalCall() * Part of the Stable ABI.* This is a shorthand for "PyErr_SetString(PyExc_SystemError, message)", where *message* indicates that an internal operation (e.g. a Python/C API function) was invoked with an illegal argument. It is mostly for internal use. Issuing warnings ================ Use these functions to issue warnings from C code. They mirror similar functions exported by the Python "warnings" module. They normally print a warning message to *sys.stderr*; however, it is also possible that the user has specified that warnings are to be turned into errors, and in that case they will raise an exception. It is also possible that the functions raise an exception because of a problem with the warning machinery. The return value is "0" if no exception is raised, or "-1" if an exception is raised. (It is not possible to determine whether a warning message is actually printed, nor what the reason is for the exception; this is intentional.) If an exception is raised, the caller should do its normal exception handling (for example, "Py_DECREF()" owned references and return an error value). int PyErr_WarnEx(PyObject *category, const char *message, Py_ssize_t stack_level) * Part of the Stable ABI.* Issue a warning message. The *category* argument is a warning category (see below) or "NULL"; the *message* argument is a UTF-8 encoded string. *stack_level* is a positive number giving a number of stack frames; the warning will be issued from the currently executing line of code in that stack frame. A *stack_level* of 1 is the function calling "PyErr_WarnEx()", 2 is the function above that, and so forth. Warning categories must be subclasses of "PyExc_Warning"; "PyExc_Warning" is a subclass of "PyExc_Exception"; the default warning category is "PyExc_RuntimeWarning". The standard Python warning categories are available as global variables whose names are enumerated at Standard Warning Categories. For information about warning control, see the documentation for the "warnings" module and the "-W" option in the command line documentation. There is no C API for warning control. int PyErr_WarnExplicitObject(PyObject *category, PyObject *message, PyObject *filename, int lineno, PyObject *module, PyObject *registry) Issue a warning message with explicit control over all warning attributes. This is a straightforward wrapper around the Python function "warnings.warn_explicit()"; see there for more information. The *module* and *registry* arguments may be set to "NULL" to get the default effect described there. Added in version 3.4. int PyErr_WarnExplicit(PyObject *category, const char *message, const char *filename, int lineno, const char *module, PyObject *registry) * Part of the Stable ABI.* Similar to "PyErr_WarnExplicitObject()" except that *message* and *module* are UTF-8 encoded strings, and *filename* is decoded from the *filesystem encoding and error handler*. int PyErr_WarnFormat(PyObject *category, Py_ssize_t stack_level, const char *format, ...) * Part of the Stable ABI.* Function similar to "PyErr_WarnEx()", but use "PyUnicode_FromFormat()" to format the warning message. *format* is an ASCII-encoded string. Added in version 3.2. int PyErr_ResourceWarning(PyObject *source, Py_ssize_t stack_level, const char *format, ...) * Part of the Stable ABI since version 3.6.* Function similar to "PyErr_WarnFormat()", but *category* is "ResourceWarning" and it passes *source* to "warnings.WarningMessage". Added in version 3.6. Querying the error indicator ============================ PyObject *PyErr_Occurred() *Return value: Borrowed reference.** Part of the Stable ABI.* Test whether the error indicator is set. If set, return the exception *type* (the first argument to the last call to one of the "PyErr_Set*" functions or to "PyErr_Restore()"). If not set, return "NULL". You do not own a reference to the return value, so you do not need to "Py_DECREF()" it. The caller must hold the GIL. Note: Do not compare the return value to a specific exception; use "PyErr_ExceptionMatches()" instead, shown below. (The comparison could easily fail since the exception may be an instance instead of a class, in the case of a class exception, or it may be a subclass of the expected exception.) int PyErr_ExceptionMatches(PyObject *exc) * Part of the Stable ABI.* Equivalent to "PyErr_GivenExceptionMatches(PyErr_Occurred(), exc)". This should only be called when an exception is actually set; a memory access violation will occur if no exception has been raised. int PyErr_GivenExceptionMatches(PyObject *given, PyObject *exc) * Part of the Stable ABI.* Return true if the *given* exception matches the exception type in *exc*. If *exc* is a class object, this also returns true when *given* is an instance of a subclass. If *exc* is a tuple, all exception types in the tuple (and recursively in subtuples) are searched for a match. PyObject *PyErr_GetRaisedException(void) *Return value: New reference.** Part of the Stable ABI since version 3.12.* Return the exception currently being raised, clearing the error indicator at the same time. Return "NULL" if the error indicator is not set. This function is used by code that needs to catch exceptions, or code that needs to save and restore the error indicator temporarily. For example: { PyObject *exc = PyErr_GetRaisedException(); /* ... code that might produce other errors ... */ PyErr_SetRaisedException(exc); } See also: "PyErr_GetHandledException()", to save the exception currently being handled. Added in version 3.12. void PyErr_SetRaisedException(PyObject *exc) * Part of the Stable ABI since version 3.12.* Set *exc* as the exception currently being raised, clearing the existing exception if one is set. Warning: This call steals a reference to *exc*, which must be a valid exception. Added in version 3.12. void PyErr_Fetch(PyObject **ptype, PyObject **pvalue, PyObject **ptraceback) * Part of the Stable ABI.* Deprecated since version 3.12: Use "PyErr_GetRaisedException()" instead. Retrieve the error indicator into three variables whose addresses are passed. If the error indicator is not set, set all three variables to "NULL". If it is set, it will be cleared and you own a reference to each object retrieved. The value and traceback object may be "NULL" even when the type object is not. Note: This function is normally only used by legacy code that needs to catch exceptions or save and restore the error indicator temporarily.For example: { PyObject *type, *value, *traceback; PyErr_Fetch(&type, &value, &traceback); /* ... code that might produce other errors ... */ PyErr_Restore(type, value, traceback); } void PyErr_Restore(PyObject *type, PyObject *value, PyObject *traceback) * Part of the Stable ABI.* Deprecated since version 3.12: Use "PyErr_SetRaisedException()" instead. Set the error indicator from the three objects, *type*, *value*, and *traceback*, clearing the existing exception if one is set. If the objects are "NULL", the error indicator is cleared. Do not pass a "NULL" type and non-"NULL" value or traceback. The exception type should be a class. Do not pass an invalid exception type or value. (Violating these rules will cause subtle problems later.) This call takes away a reference to each object: you must own a reference to each object before the call and after the call you no longer own these references. (If you don’t understand this, don’t use this function. I warned you.) Note: This function is normally only used by legacy code that needs to save and restore the error indicator temporarily. Use "PyErr_Fetch()" to save the current error indicator. void PyErr_NormalizeException(PyObject **exc, PyObject **val, PyObject **tb) * Part of the Stable ABI.* Deprecated since version 3.12: Use "PyErr_GetRaisedException()" instead, to avoid any possible de-normalization. Under certain circumstances, the values returned by "PyErr_Fetch()" below can be “unnormalized”, meaning that "*exc" is a class object but "*val" is not an instance of the same class. This function can be used to instantiate the class in that case. If the values are already normalized, nothing happens. The delayed normalization is implemented to improve performance. Note: This function *does not* implicitly set the "__traceback__" attribute on the exception value. If setting the traceback appropriately is desired, the following additional snippet is needed: if (tb != NULL) { PyException_SetTraceback(val, tb); } PyObject *PyErr_GetHandledException(void) * Part of the Stable ABI since version 3.11.* Retrieve the active exception instance, as would be returned by "sys.exception()". This refers to an exception that was *already caught*, not to an exception that was freshly raised. Returns a new reference to the exception or "NULL". Does not modify the interpreter’s exception state. Note: This function is not normally used by code that wants to handle exceptions. Rather, it can be used when code needs to save and restore the exception state temporarily. Use "PyErr_SetHandledException()" to restore or clear the exception state. Added in version 3.11. void PyErr_SetHandledException(PyObject *exc) * Part of the Stable ABI since version 3.11.* Set the active exception, as known from "sys.exception()". This refers to an exception that was *already caught*, not to an exception that was freshly raised. To clear the exception state, pass "NULL". Note: This function is not normally used by code that wants to handle exceptions. Rather, it can be used when code needs to save and restore the exception state temporarily. Use "PyErr_GetHandledException()" to get the exception state. Added in version 3.11. void PyErr_GetExcInfo(PyObject **ptype, PyObject **pvalue, PyObject **ptraceback) * Part of the Stable ABI since version 3.7.* Retrieve the old-style representation of the exception info, as known from "sys.exc_info()". This refers to an exception that was *already caught*, not to an exception that was freshly raised. Returns new references for the three objects, any of which may be "NULL". Does not modify the exception info state. This function is kept for backwards compatibility. Prefer using "PyErr_GetHandledException()". Note: This function is not normally used by code that wants to handle exceptions. Rather, it can be used when code needs to save and restore the exception state temporarily. Use "PyErr_SetExcInfo()" to restore or clear the exception state. Added in version 3.3. void PyErr_SetExcInfo(PyObject *type, PyObject *value, PyObject *traceback) * Part of the Stable ABI since version 3.7.* Set the exception info, as known from "sys.exc_info()". This refers to an exception that was *already caught*, not to an exception that was freshly raised. This function steals the references of the arguments. To clear the exception state, pass "NULL" for all three arguments. This function is kept for backwards compatibility. Prefer using "PyErr_SetHandledException()". Note: This function is not normally used by code that wants to handle exceptions. Rather, it can be used when code needs to save and restore the exception state temporarily. Use "PyErr_GetExcInfo()" to read the exception state. Added in version 3.3. Changed in version 3.11: The "type" and "traceback" arguments are no longer used and can be NULL. The interpreter now derives them from the exception instance (the "value" argument). The function still steals references of all three arguments. Signal Handling =============== int PyErr_CheckSignals() * Part of the Stable ABI.* This function interacts with Python’s signal handling. If the function is called from the main thread and under the main Python interpreter, it checks whether a signal has been sent to the processes and if so, invokes the corresponding signal handler. If the "signal" module is supported, this can invoke a signal handler written in Python. The function attempts to handle all pending signals, and then returns "0". However, if a Python signal handler raises an exception, the error indicator is set and the function returns "-1" immediately (such that other pending signals may not have been handled yet: they will be on the next "PyErr_CheckSignals()" invocation). If the function is called from a non-main thread, or under a non- main Python interpreter, it does nothing and returns "0". This function can be called by long-running C code that wants to be interruptible by user requests (such as by pressing Ctrl-C). Note: The default Python signal handler for "SIGINT" raises the "KeyboardInterrupt" exception. void PyErr_SetInterrupt() * Part of the Stable ABI.* Simulate the effect of a "SIGINT" signal arriving. This is equivalent to "PyErr_SetInterruptEx(SIGINT)". Note: This function is async-signal-safe. It can be called without the *GIL* and from a C signal handler. int PyErr_SetInterruptEx(int signum) * Part of the Stable ABI since version 3.10.* Simulate the effect of a signal arriving. The next time "PyErr_CheckSignals()" is called, the Python signal handler for the given signal number will be called. This function can be called by C code that sets up its own signal handling and wants Python signal handlers to be invoked as expected when an interruption is requested (for example when the user presses Ctrl-C to interrupt an operation). If the given signal isn’t handled by Python (it was set to "signal.SIG_DFL" or "signal.SIG_IGN"), it will be ignored. If *signum* is outside of the allowed range of signal numbers, "-1" is returned. Otherwise, "0" is returned. The error indicator is never changed by this function. Note: This function is async-signal-safe. It can be called without the *GIL* and from a C signal handler. Added in version 3.10. int PySignal_SetWakeupFd(int fd) This utility function specifies a file descriptor to which the signal number is written as a single byte whenever a signal is received. *fd* must be non-blocking. It returns the previous such file descriptor. The value "-1" disables the feature; this is the initial state. This is equivalent to "signal.set_wakeup_fd()" in Python, but without any error checking. *fd* should be a valid file descriptor. The function should only be called from the main thread. Changed in version 3.5: On Windows, the function now also supports socket handles. Exception Classes ================= PyObject *PyErr_NewException(const char *name, PyObject *base, PyObject *dict) *Return value: New reference.** Part of the Stable ABI.* This utility function creates and returns a new exception class. The *name* argument must be the name of the new exception, a C string of the form "module.classname". The *base* and *dict* arguments are normally "NULL". This creates a class object derived from "Exception" (accessible in C as "PyExc_Exception"). The "__module__" attribute of the new class is set to the first part (up to the last dot) of the *name* argument, and the class name is set to the last part (after the last dot). The *base* argument can be used to specify alternate base classes; it can either be only one class or a tuple of classes. The *dict* argument can be used to specify a dictionary of class variables and methods. PyObject *PyErr_NewExceptionWithDoc(const char *name, const char *doc, PyObject *base, PyObject *dict) *Return value: New reference.** Part of the Stable ABI.* Same as "PyErr_NewException()", except that the new exception class can easily be given a docstring: If *doc* is non-"NULL", it will be used as the docstring for the exception class. Added in version 3.2. Exception Objects ================= PyObject *PyException_GetTraceback(PyObject *ex) *Return value: New reference.** Part of the Stable ABI.* Return the traceback associated with the exception as a new reference, as accessible from Python through the "__traceback__" attribute. If there is no traceback associated, this returns "NULL". int PyException_SetTraceback(PyObject *ex, PyObject *tb) * Part of the Stable ABI.* Set the traceback associated with the exception to *tb*. Use "Py_None" to clear it. PyObject *PyException_GetContext(PyObject *ex) *Return value: New reference.** Part of the Stable ABI.* Return the context (another exception instance during whose handling *ex* was raised) associated with the exception as a new reference, as accessible from Python through the "__context__" attribute. If there is no context associated, this returns "NULL". void PyException_SetContext(PyObject *ex, PyObject *ctx) * Part of the Stable ABI.* Set the context associated with the exception to *ctx*. Use "NULL" to clear it. There is no type check to make sure that *ctx* is an exception instance. This steals a reference to *ctx*. PyObject *PyException_GetCause(PyObject *ex) *Return value: New reference.** Part of the Stable ABI.* Return the cause (either an exception instance, or "None", set by "raise ... from ...") associated with the exception as a new reference, as accessible from Python through the "__cause__" attribute. void PyException_SetCause(PyObject *ex, PyObject *cause) * Part of the Stable ABI.* Set the cause associated with the exception to *cause*. Use "NULL" to clear it. There is no type check to make sure that *cause* is either an exception instance or "None". This steals a reference to *cause*. The "__suppress_context__" attribute is implicitly set to "True" by this function. PyObject *PyException_GetArgs(PyObject *ex) *Return value: New reference.** Part of the Stable ABI since version 3.12.* Return "args" of exception *ex*. void PyException_SetArgs(PyObject *ex, PyObject *args) * Part of the Stable ABI since version 3.12.* Set "args" of exception *ex* to *args*. PyObject *PyUnstable_Exc_PrepReraiseStar(PyObject *orig, PyObject *excs) *This is Unstable API. It may change without warning in minor releases.* Implement part of the interpreter’s implementation of "except*". *orig* is the original exception that was caught, and *excs* is the list of the exceptions that need to be raised. This list contains the unhandled part of *orig*, if any, as well as the exceptions that were raised from the "except*" clauses (so they have a different traceback from *orig*) and those that were reraised (and have the same traceback as *orig*). Return the "ExceptionGroup" that needs to be reraised in the end, or "None" if there is nothing to reraise. Added in version 3.12. Unicode Exception Objects ========================= The following functions are used to create and modify Unicode exceptions from C. PyObject *PyUnicodeDecodeError_Create(const char *encoding, const char *object, Py_ssize_t length, Py_ssize_t start, Py_ssize_t end, const char *reason) *Return value: New reference.** Part of the Stable ABI.* Create a "UnicodeDecodeError" object with the attributes *encoding*, *object*, *length*, *start*, *end* and *reason*. *encoding* and *reason* are UTF-8 encoded strings. PyObject *PyUnicodeDecodeError_GetEncoding(PyObject *exc) PyObject *PyUnicodeEncodeError_GetEncoding(PyObject *exc) *Return value: New reference.** Part of the Stable ABI.* Return the *encoding* attribute of the given exception object. PyObject *PyUnicodeDecodeError_GetObject(PyObject *exc) PyObject *PyUnicodeEncodeError_GetObject(PyObject *exc) PyObject *PyUnicodeTranslateError_GetObject(PyObject *exc) *Return value: New reference.** Part of the Stable ABI.* Return the *object* attribute of the given exception object. int PyUnicodeDecodeError_GetStart(PyObject *exc, Py_ssize_t *start) int PyUnicodeEncodeError_GetStart(PyObject *exc, Py_ssize_t *start) int PyUnicodeTranslateError_GetStart(PyObject *exc, Py_ssize_t *start) * Part of the Stable ABI.* Get the *start* attribute of the given exception object and place it into **start*. *start* must not be "NULL". Return "0" on success, "-1" on failure. int PyUnicodeDecodeError_SetStart(PyObject *exc, Py_ssize_t start) int PyUnicodeEncodeError_SetStart(PyObject *exc, Py_ssize_t start) int PyUnicodeTranslateError_SetStart(PyObject *exc, Py_ssize_t start) * Part of the Stable ABI.* Set the *start* attribute of the given exception object to *start*. Return "0" on success, "-1" on failure. int PyUnicodeDecodeError_GetEnd(PyObject *exc, Py_ssize_t *end) int PyUnicodeEncodeError_GetEnd(PyObject *exc, Py_ssize_t *end) int PyUnicodeTranslateError_GetEnd(PyObject *exc, Py_ssize_t *end) * Part of the Stable ABI.* Get the *end* attribute of the given exception object and place it into **end*. *end* must not be "NULL". Return "0" on success, "-1" on failure. int PyUnicodeDecodeError_SetEnd(PyObject *exc, Py_ssize_t end) int PyUnicodeEncodeError_SetEnd(PyObject *exc, Py_ssize_t end) int PyUnicodeTranslateError_SetEnd(PyObject *exc, Py_ssize_t end) * Part of the Stable ABI.* Set the *end* attribute of the given exception object to *end*. Return "0" on success, "-1" on failure. PyObject *PyUnicodeDecodeError_GetReason(PyObject *exc) PyObject *PyUnicodeEncodeError_GetReason(PyObject *exc) PyObject *PyUnicodeTranslateError_GetReason(PyObject *exc) *Return value: New reference.** Part of the Stable ABI.* Return the *reason* attribute of the given exception object. int PyUnicodeDecodeError_SetReason(PyObject *exc, const char *reason) int PyUnicodeEncodeError_SetReason(PyObject *exc, const char *reason) int PyUnicodeTranslateError_SetReason(PyObject *exc, const char *reason) * Part of the Stable ABI.* Set the *reason* attribute of the given exception object to *reason*. Return "0" on success, "-1" on failure. Recursion Control ================= These two functions provide a way to perform safe recursive calls at the C level, both in the core and in extension modules. They are needed if the recursive code does not necessarily invoke Python code (which tracks its recursion depth automatically). They are also not needed for *tp_call* implementations because the call protocol takes care of recursion handling. int Py_EnterRecursiveCall(const char *where) * Part of the Stable ABI since version 3.9.* Marks a point where a recursive C-level call is about to be performed. If "USE_STACKCHECK" is defined, this function checks if the OS stack overflowed using "PyOS_CheckStack()". If this is the case, it sets a "MemoryError" and returns a nonzero value. The function then checks if the recursion limit is reached. If this is the case, a "RecursionError" is set and a nonzero value is returned. Otherwise, zero is returned. *where* should be a UTF-8 encoded string such as "" in instance check"" to be concatenated to the "RecursionError" message caused by the recursion depth limit. Changed in version 3.9: This function is now also available in the limited API. void Py_LeaveRecursiveCall(void) * Part of the Stable ABI since version 3.9.* Ends a "Py_EnterRecursiveCall()". Must be called once for each *successful* invocation of "Py_EnterRecursiveCall()". Changed in version 3.9: This function is now also available in the limited API. Properly implementing "tp_repr" for container types requires special recursion handling. In addition to protecting the stack, "tp_repr" also needs to track objects to prevent cycles. The following two functions facilitate this functionality. Effectively, these are the C equivalent to "reprlib.recursive_repr()". int Py_ReprEnter(PyObject *object) * Part of the Stable ABI.* Called at the beginning of the "tp_repr" implementation to detect cycles. If the object has already been processed, the function returns a positive integer. In that case the "tp_repr" implementation should return a string object indicating a cycle. As examples, "dict" objects return "{...}" and "list" objects return "[...]". The function will return a negative integer if the recursion limit is reached. In that case the "tp_repr" implementation should typically return "NULL". Otherwise, the function returns zero and the "tp_repr" implementation can continue normally. void Py_ReprLeave(PyObject *object) * Part of the Stable ABI.* Ends a "Py_ReprEnter()". Must be called once for each invocation of "Py_ReprEnter()" that returns zero. Standard Exceptions =================== All standard Python exceptions are available as global variables whose names are "PyExc_" followed by the Python exception name. These have the type PyObject*; they are all class objects. For completeness, here are all the variables: +-------------------------------------------+-----------------------------------+------------+ | C Name | Python Name | Notes | |===========================================|===================================|============| | "PyExc_BaseException" | "BaseException" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_BaseExceptionGroup" | "BaseExceptionGroup" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_Exception" | "Exception" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ArithmeticError" | "ArithmeticError" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_AssertionError" | "AssertionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_AttributeError" | "AttributeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_BlockingIOError" | "BlockingIOError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_BrokenPipeError" | "BrokenPipeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_BufferError" | "BufferError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ChildProcessError" | "ChildProcessError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionAbortedError" | "ConnectionAbortedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionError" | "ConnectionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionRefusedError" | "ConnectionRefusedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionResetError" | "ConnectionResetError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_EOFError" | "EOFError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_FileExistsError" | "FileExistsError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_FileNotFoundError" | "FileNotFoundError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_FloatingPointError" | "FloatingPointError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_GeneratorExit" | "GeneratorExit" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ImportError" | "ImportError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_IndentationError" | "IndentationError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_IndexError" | "IndexError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_InterruptedError" | "InterruptedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_IsADirectoryError" | "IsADirectoryError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_KeyError" | "KeyError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_KeyboardInterrupt" | "KeyboardInterrupt" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_LookupError" | "LookupError" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_MemoryError" | "MemoryError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ModuleNotFoundError" | "ModuleNotFoundError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_NameError" | "NameError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_NotADirectoryError" | "NotADirectoryError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_NotImplementedError" | "NotImplementedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_OSError" | "OSError" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_OverflowError" | "OverflowError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_PermissionError" | "PermissionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ProcessLookupError" | "ProcessLookupError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_PythonFinalizationError" | "PythonFinalizationError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_RecursionError" | "RecursionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ReferenceError" | "ReferenceError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_RuntimeError" | "RuntimeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_StopAsyncIteration" | "StopAsyncIteration" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_StopIteration" | "StopIteration" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_SyntaxError" | "SyntaxError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_SystemError" | "SystemError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_SystemExit" | "SystemExit" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_TabError" | "TabError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_TimeoutError" | "TimeoutError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_TypeError" | "TypeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnboundLocalError" | "UnboundLocalError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeDecodeError" | "UnicodeDecodeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeEncodeError" | "UnicodeEncodeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeError" | "UnicodeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeTranslateError" | "UnicodeTranslateError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ValueError" | "ValueError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ZeroDivisionError" | "ZeroDivisionError" | | +-------------------------------------------+-----------------------------------+------------+ Added in version 3.3: "PyExc_BlockingIOError", "PyExc_BrokenPipeError", "PyExc_ChildProcessError", "PyExc_ConnectionError", "PyExc_ConnectionAbortedError", "PyExc_ConnectionRefusedError", "PyExc_ConnectionResetError", "PyExc_FileExistsError", "PyExc_FileNotFoundError", "PyExc_InterruptedError", "PyExc_IsADirectoryError", "PyExc_NotADirectoryError", "PyExc_PermissionError", "PyExc_ProcessLookupError" and "PyExc_TimeoutError" were introduced following **PEP 3151**. Added in version 3.5: "PyExc_StopAsyncIteration" and "PyExc_RecursionError". Added in version 3.6: "PyExc_ModuleNotFoundError". Added in version 3.11: "PyExc_BaseExceptionGroup". These are compatibility aliases to "PyExc_OSError": +---------------------------------------+------------+ | C Name | Notes | |=======================================|============| | "PyExc_EnvironmentError" | | +---------------------------------------+------------+ | "PyExc_IOError" | | +---------------------------------------+------------+ | "PyExc_WindowsError" | [2] | +---------------------------------------+------------+ Changed in version 3.3: These aliases used to be separate exception types. Notes: [1] This is a base class for other standard exceptions. [2] Only defined on Windows; protect code that uses this by testing that the preprocessor macro "MS_WINDOWS" is defined. Standard Warning Categories =========================== All standard Python warning categories are available as global variables whose names are "PyExc_" followed by the Python exception name. These have the type PyObject*; they are all class objects. For completeness, here are all the variables: +--------------------------------------------+-----------------------------------+------------+ | C Name | Python Name | Notes | |============================================|===================================|============| | "PyExc_Warning" | "Warning" | [3] | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_BytesWarning" | "BytesWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_DeprecationWarning" | "DeprecationWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_EncodingWarning" | "EncodingWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_FutureWarning" | "FutureWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_ImportWarning" | "ImportWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_PendingDeprecationWarning" | "PendingDeprecationWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_ResourceWarning" | "ResourceWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_RuntimeWarning" | "RuntimeWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_SyntaxWarning" | "SyntaxWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeWarning" | "UnicodeWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_UserWarning" | "UserWarning" | | +--------------------------------------------+-----------------------------------+------------+ Added in version 3.2: "PyExc_ResourceWarning". Added in version 3.10: "PyExc_EncodingWarning". Notes: [3] This is a base class for other standard warning categories. File Objects ************ These APIs are a minimal emulation of the Python 2 C API for built-in file objects, which used to rely on the buffered I/O (FILE*) support from the C standard library. In Python 3, files and streams use the new "io" module, which defines several layers over the low-level unbuffered I/O of the operating system. The functions described below are convenience C wrappers over these new APIs, and meant mostly for internal error reporting in the interpreter; third-party code is advised to access the "io" APIs instead. PyObject *PyFile_FromFd(int fd, const char *name, const char *mode, int buffering, const char *encoding, const char *errors, const char *newline, int closefd) *Return value: New reference.** Part of the Stable ABI.* Create a Python file object from the file descriptor of an already opened file *fd*. The arguments *name*, *encoding*, *errors* and *newline* can be "NULL" to use the defaults; *buffering* can be *-1* to use the default. *name* is ignored and kept for backward compatibility. Return "NULL" on failure. For a more comprehensive description of the arguments, please refer to the "io.open()" function documentation. Warning: Since Python streams have their own buffering layer, mixing them with OS-level file descriptors can produce various issues (such as unexpected ordering of data). Changed in version 3.2: Ignore *name* attribute. int PyObject_AsFileDescriptor(PyObject *p) * Part of the Stable ABI.* Return the file descriptor associated with *p* as an int. If the object is an integer, its value is returned. If not, the object’s "fileno()" method is called if it exists; the method must return an integer, which is returned as the file descriptor value. Sets an exception and returns "-1" on failure. PyObject *PyFile_GetLine(PyObject *p, int n) *Return value: New reference.** Part of the Stable ABI.* Equivalent to "p.readline([n])", this function reads one line from the object *p*. *p* may be a file object or any object with a "readline()" method. If *n* is "0", exactly one line is read, regardless of the length of the line. If *n* is greater than "0", no more than *n* bytes will be read from the file; a partial line can be returned. In both cases, an empty string is returned if the end of the file is reached immediately. If *n* is less than "0", however, one line is read regardless of length, but "EOFError" is raised if the end of the file is reached immediately. int PyFile_SetOpenCodeHook(Py_OpenCodeHookFunction handler) Overrides the normal behavior of "io.open_code()" to pass its parameter through the provided handler. The *handler* is a function of type: typedef PyObject *(*Py_OpenCodeHookFunction)(PyObject*, void*) Equivalent of PyObject *(*)(PyObject *path, void *userData), where *path* is guaranteed to be "PyUnicodeObject". The *userData* pointer is passed into the hook function. Since hook functions may be called from different runtimes, this pointer should not refer directly to Python state. As this hook is intentionally used during import, avoid importing new modules during its execution unless they are known to be frozen or available in "sys.modules". Once a hook has been set, it cannot be removed or replaced, and later calls to "PyFile_SetOpenCodeHook()" will fail. On failure, the function returns -1 and sets an exception if the interpreter has been initialized. This function is safe to call before "Py_Initialize()". Raises an auditing event "setopencodehook" with no arguments. Added in version 3.8. int PyFile_WriteObject(PyObject *obj, PyObject *p, int flags) * Part of the Stable ABI.* Write object *obj* to file object *p*. The only supported flag for *flags* is "Py_PRINT_RAW"; if given, the "str()" of the object is written instead of the "repr()". Return "0" on success or "-1" on failure; the appropriate exception will be set. int PyFile_WriteString(const char *s, PyObject *p) * Part of the Stable ABI.* Write string *s* to file object *p*. Return "0" on success or "-1" on failure; the appropriate exception will be set. Floating-Point Objects ********************** type PyFloatObject This subtype of "PyObject" represents a Python floating-point object. PyTypeObject PyFloat_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python floating- point type. This is the same object as "float" in the Python layer. int PyFloat_Check(PyObject *p) Return true if its argument is a "PyFloatObject" or a subtype of "PyFloatObject". This function always succeeds. int PyFloat_CheckExact(PyObject *p) Return true if its argument is a "PyFloatObject", but not a subtype of "PyFloatObject". This function always succeeds. PyObject *PyFloat_FromString(PyObject *str) *Return value: New reference.** Part of the Stable ABI.* Create a "PyFloatObject" object based on the string value in *str*, or "NULL" on failure. PyObject *PyFloat_FromDouble(double v) *Return value: New reference.** Part of the Stable ABI.* Create a "PyFloatObject" object from *v*, or "NULL" on failure. double PyFloat_AsDouble(PyObject *pyfloat) * Part of the Stable ABI.* Return a C double representation of the contents of *pyfloat*. If *pyfloat* is not a Python floating-point object but has a "__float__()" method, this method will first be called to convert *pyfloat* into a float. If "__float__()" is not defined then it falls back to "__index__()". This method returns "-1.0" upon failure, so one should call "PyErr_Occurred()" to check for errors. Changed in version 3.8: Use "__index__()" if available. double PyFloat_AS_DOUBLE(PyObject *pyfloat) Return a C double representation of the contents of *pyfloat*, but without error checking. PyObject *PyFloat_GetInfo(void) *Return value: New reference.** Part of the Stable ABI.* Return a structseq instance which contains information about the precision, minimum and maximum values of a float. It’s a thin wrapper around the header file "float.h". double PyFloat_GetMax() * Part of the Stable ABI.* Return the maximum representable finite float *DBL_MAX* as C double. double PyFloat_GetMin() * Part of the Stable ABI.* Return the minimum normalized positive float *DBL_MIN* as C double. Pack and Unpack functions ========================= The pack and unpack functions provide an efficient platform- independent way to store floating-point values as byte strings. The Pack routines produce a bytes string from a C double, and the Unpack routines produce a C double from such a bytes string. The suffix (2, 4 or 8) specifies the number of bytes in the bytes string. On platforms that appear to use IEEE 754 formats these functions work by copying bits. On other platforms, the 2-byte format is identical to the IEEE 754 binary16 half-precision format, the 4-byte format (32-bit) is identical to the IEEE 754 binary32 single precision format, and the 8-byte format to the IEEE 754 binary64 double precision format, although the packing of INFs and NaNs (if such things exist on the platform) isn’t handled correctly, and attempting to unpack a bytes string containing an IEEE INF or NaN will raise an exception. On non-IEEE platforms with more precision, or larger dynamic range, than IEEE 754 supports, not all values can be packed; on non-IEEE platforms with less precision, or smaller dynamic range, not all values can be unpacked. What happens in such cases is partly accidental (alas). Added in version 3.11. Pack functions -------------- The pack routines write 2, 4 or 8 bytes, starting at *p*. *le* is an int argument, non-zero if you want the bytes string in little-endian format (exponent last, at "p+1", "p+3", or "p+6" "p+7"), zero if you want big-endian format (exponent first, at *p*). The "PY_BIG_ENDIAN" constant can be used to use the native endian: it is equal to "1" on big endian processor, or "0" on little endian processor. Return value: "0" if all is OK, "-1" if error (and an exception is set, most likely "OverflowError"). There are two problems on non-IEEE platforms: * What this does is undefined if *x* is a NaN or infinity. * "-0.0" and "+0.0" produce the same bytes string. int PyFloat_Pack2(double x, unsigned char *p, int le) Pack a C double as the IEEE 754 binary16 half-precision format. int PyFloat_Pack4(double x, unsigned char *p, int le) Pack a C double as the IEEE 754 binary32 single precision format. int PyFloat_Pack8(double x, unsigned char *p, int le) Pack a C double as the IEEE 754 binary64 double precision format. Unpack functions ---------------- The unpack routines read 2, 4 or 8 bytes, starting at *p*. *le* is an int argument, non-zero if the bytes string is in little-endian format (exponent last, at "p+1", "p+3" or "p+6" and "p+7"), zero if big- endian (exponent first, at *p*). The "PY_BIG_ENDIAN" constant can be used to use the native endian: it is equal to "1" on big endian processor, or "0" on little endian processor. Return value: The unpacked double. On error, this is "-1.0" and "PyErr_Occurred()" is true (and an exception is set, most likely "OverflowError"). Note that on a non-IEEE platform this will refuse to unpack a bytes string that represents a NaN or infinity. double PyFloat_Unpack2(const unsigned char *p, int le) Unpack the IEEE 754 binary16 half-precision format as a C double. double PyFloat_Unpack4(const unsigned char *p, int le) Unpack the IEEE 754 binary32 single precision format as a C double. double PyFloat_Unpack8(const unsigned char *p, int le) Unpack the IEEE 754 binary64 double precision format as a C double. Frame Objects ************* type PyFrameObject * Part of the Limited API (as an opaque struct).* The C structure of the objects used to describe frame objects. There are no public members in this structure. Changed in version 3.11: The members of this structure were removed from the public C API. Refer to the What’s New entry for details. The "PyEval_GetFrame()" and "PyThreadState_GetFrame()" functions can be used to get a frame object. See also Reflection. PyTypeObject PyFrame_Type The type of frame objects. It is the same object as "types.FrameType" in the Python layer. Changed in version 3.11: Previously, this type was only available after including "". int PyFrame_Check(PyObject *obj) Return non-zero if *obj* is a frame object. Changed in version 3.11: Previously, this function was only available after including "". PyFrameObject *PyFrame_GetBack(PyFrameObject *frame) *Return value: New reference.* Get the *frame* next outer frame. Return a *strong reference*, or "NULL" if *frame* has no outer frame. Added in version 3.9. PyObject *PyFrame_GetBuiltins(PyFrameObject *frame) *Return value: New reference.* Get the *frame*’s "f_builtins" attribute. Return a *strong reference*. The result cannot be "NULL". Added in version 3.11. PyCodeObject *PyFrame_GetCode(PyFrameObject *frame) *Return value: New reference.** Part of the Stable ABI since version 3.10.* Get the *frame* code. Return a *strong reference*. The result (frame code) cannot be "NULL". Added in version 3.9. PyObject *PyFrame_GetGenerator(PyFrameObject *frame) *Return value: New reference.* Get the generator, coroutine, or async generator that owns this frame, or "NULL" if this frame is not owned by a generator. Does not raise an exception, even if the return value is "NULL". Return a *strong reference*, or "NULL". Added in version 3.11. PyObject *PyFrame_GetGlobals(PyFrameObject *frame) *Return value: New reference.* Get the *frame*’s "f_globals" attribute. Return a *strong reference*. The result cannot be "NULL". Added in version 3.11. int PyFrame_GetLasti(PyFrameObject *frame) Get the *frame*’s "f_lasti" attribute. Returns -1 if "frame.f_lasti" is "None". Added in version 3.11. PyObject *PyFrame_GetVar(PyFrameObject *frame, PyObject *name) *Return value: New reference.* Get the variable *name* of *frame*. * Return a *strong reference* to the variable value on success. * Raise "NameError" and return "NULL" if the variable does not exist. * Raise an exception and return "NULL" on error. *name* type must be a "str". Added in version 3.12. PyObject *PyFrame_GetVarString(PyFrameObject *frame, const char *name) *Return value: New reference.* Similar to "PyFrame_GetVar()", but the variable name is a C string encoded in UTF-8. Added in version 3.12. PyObject *PyFrame_GetLocals(PyFrameObject *frame) *Return value: New reference.* Get the *frame*’s "f_locals" attribute. If the frame refers to an *optimized scope*, this returns a write-through proxy object that allows modifying the locals. In all other cases (classes, modules, "exec()", "eval()") it returns the mapping representing the frame locals directly (as described for "locals()"). Return a *strong reference*. Added in version 3.11. Changed in version 3.13: As part of **PEP 667**, return an instance of "PyFrameLocalsProxy_Type". int PyFrame_GetLineNumber(PyFrameObject *frame) * Part of the Stable ABI since version 3.10.* Return the line number that *frame* is currently executing. Frame Locals Proxies ==================== Added in version 3.13. The "f_locals" attribute on a frame object is an instance of a “frame- locals proxy”. The proxy object exposes a write-through view of the underlying locals dictionary for the frame. This ensures that the variables exposed by "f_locals" are always up to date with the live local variables in the frame itself. See **PEP 667** for more information. PyTypeObject PyFrameLocalsProxy_Type The type of frame "locals()" proxy objects. int PyFrameLocalsProxy_Check(PyObject *obj) Return non-zero if *obj* is a frame "locals()" proxy. Internal Frames =============== Unless using **PEP 523**, you will not need this. struct _PyInterpreterFrame The interpreter’s internal frame representation. Added in version 3.11. PyObject *PyUnstable_InterpreterFrame_GetCode(struct _PyInterpreterFrame *frame); *This is Unstable API. It may change without warning in minor releases.* Return a *strong reference* to the code object for the frame. Added in version 3.12. int PyUnstable_InterpreterFrame_GetLasti(struct _PyInterpreterFrame *frame); *This is Unstable API. It may change without warning in minor releases.* Return the byte offset into the last executed instruction. Added in version 3.12. int PyUnstable_InterpreterFrame_GetLine(struct _PyInterpreterFrame *frame); *This is Unstable API. It may change without warning in minor releases.* Return the currently executing line number, or -1 if there is no line number. Added in version 3.12. Function Objects **************** There are a few functions specific to Python functions. type PyFunctionObject The C structure used for functions. PyTypeObject PyFunction_Type This is an instance of "PyTypeObject" and represents the Python function type. It is exposed to Python programmers as "types.FunctionType". int PyFunction_Check(PyObject *o) Return true if *o* is a function object (has type "PyFunction_Type"). The parameter must not be "NULL". This function always succeeds. PyObject *PyFunction_New(PyObject *code, PyObject *globals) *Return value: New reference.* Return a new function object associated with the code object *code*. *globals* must be a dictionary with the global variables accessible to the function. The function’s docstring and name are retrieved from the code object. "__module__" is retrieved from *globals*. The argument defaults, annotations and closure are set to "NULL". "__qualname__" is set to the same value as the code object’s "co_qualname" field. PyObject *PyFunction_NewWithQualName(PyObject *code, PyObject *globals, PyObject *qualname) *Return value: New reference.* As "PyFunction_New()", but also allows setting the function object’s "__qualname__" attribute. *qualname* should be a unicode object or "NULL"; if "NULL", the "__qualname__" attribute is set to the same value as the code object’s "co_qualname" field. Added in version 3.3. PyObject *PyFunction_GetCode(PyObject *op) *Return value: Borrowed reference.* Return the code object associated with the function object *op*. PyObject *PyFunction_GetGlobals(PyObject *op) *Return value: Borrowed reference.* Return the globals dictionary associated with the function object *op*. PyObject *PyFunction_GetModule(PyObject *op) *Return value: Borrowed reference.* Return a *borrowed reference* to the "__module__" attribute of the function object *op*. It can be *NULL*. This is normally a "string" containing the module name, but can be set to any other object by Python code. PyObject *PyFunction_GetDefaults(PyObject *op) *Return value: Borrowed reference.* Return the argument default values of the function object *op*. This can be a tuple of arguments or "NULL". int PyFunction_SetDefaults(PyObject *op, PyObject *defaults) Set the argument default values for the function object *op*. *defaults* must be "Py_None" or a tuple. Raises "SystemError" and returns "-1" on failure. void PyFunction_SetVectorcall(PyFunctionObject *func, vectorcallfunc vectorcall) Set the vectorcall field of a given function object *func*. Warning: extensions using this API must preserve the behavior of the unaltered (default) vectorcall function! Added in version 3.12. PyObject *PyFunction_GetClosure(PyObject *op) *Return value: Borrowed reference.* Return the closure associated with the function object *op*. This can be "NULL" or a tuple of cell objects. int PyFunction_SetClosure(PyObject *op, PyObject *closure) Set the closure associated with the function object *op*. *closure* must be "Py_None" or a tuple of cell objects. Raises "SystemError" and returns "-1" on failure. PyObject *PyFunction_GetAnnotations(PyObject *op) *Return value: Borrowed reference.* Return the annotations of the function object *op*. This can be a mutable dictionary or "NULL". int PyFunction_SetAnnotations(PyObject *op, PyObject *annotations) Set the annotations for the function object *op*. *annotations* must be a dictionary or "Py_None". Raises "SystemError" and returns "-1" on failure. int PyFunction_AddWatcher(PyFunction_WatchCallback callback) Register *callback* as a function watcher for the current interpreter. Return an ID which may be passed to "PyFunction_ClearWatcher()". In case of error (e.g. no more watcher IDs available), return "-1" and set an exception. Added in version 3.12. int PyFunction_ClearWatcher(int watcher_id) Clear watcher identified by *watcher_id* previously returned from "PyFunction_AddWatcher()" for the current interpreter. Return "0" on success, or "-1" and set an exception on error (e.g. if the given *watcher_id* was never registered.) Added in version 3.12. type PyFunction_WatchEvent Enumeration of possible function watcher events: * "PyFunction_EVENT_CREATE" * "PyFunction_EVENT_DESTROY" * "PyFunction_EVENT_MODIFY_CODE" * "PyFunction_EVENT_MODIFY_DEFAULTS" * "PyFunction_EVENT_MODIFY_KWDEFAULTS" Added in version 3.12. typedef int (*PyFunction_WatchCallback)(PyFunction_WatchEvent event, PyFunctionObject *func, PyObject *new_value) Type of a function watcher callback function. If *event* is "PyFunction_EVENT_CREATE" or "PyFunction_EVENT_DESTROY" then *new_value* will be "NULL". Otherwise, *new_value* will hold a *borrowed reference* to the new value that is about to be stored in *func* for the attribute that is being modified. The callback may inspect but must not modify *func*; doing so could have unpredictable effects, including infinite recursion. If *event* is "PyFunction_EVENT_CREATE", then the callback is invoked after *func* has been fully initialized. Otherwise, the callback is invoked before the modification to *func* takes place, so the prior state of *func* can be inspected. The runtime is permitted to optimize away the creation of function objects when possible. In such cases no event will be emitted. Although this creates the possibility of an observable difference of runtime behavior depending on optimization decisions, it does not change the semantics of the Python code being executed. If *event* is "PyFunction_EVENT_DESTROY", Taking a reference in the callback to the about-to-be-destroyed function will resurrect it, preventing it from being freed at this time. When the resurrected object is destroyed later, any watcher callbacks active at that time will be called again. If the callback sets an exception, it must return "-1"; this exception will be printed as an unraisable exception using "PyErr_WriteUnraisable()". Otherwise it should return "0". There may already be a pending exception set on entry to the callback. In this case, the callback should return "0" with the same exception still set. This means the callback may not call any other API that can set an exception unless it saves and clears the exception state first, and restores it before returning. Added in version 3.12. Supporting Cyclic Garbage Collection ************************************ Python’s support for detecting and collecting garbage which involves circular references requires support from object types which are “containers” for other objects which may also be containers. Types which do not store references to other objects, or which only store references to atomic types (such as numbers or strings), do not need to provide any explicit support for garbage collection. To create a container type, the "tp_flags" field of the type object must include the "Py_TPFLAGS_HAVE_GC" and provide an implementation of the "tp_traverse" handler. If instances of the type are mutable, a "tp_clear" implementation must also be provided. "Py_TPFLAGS_HAVE_GC" Objects with a type with this flag set must conform with the rules documented here. For convenience these objects will be referred to as container objects. Constructors for container types must conform to two rules: 1. The memory for the object must be allocated using "PyObject_GC_New" or "PyObject_GC_NewVar". 2. Once all the fields which may contain references to other containers are initialized, it must call "PyObject_GC_Track()". Similarly, the deallocator for the object must conform to a similar pair of rules: 1. Before fields which refer to other containers are invalidated, "PyObject_GC_UnTrack()" must be called. 2. The object’s memory must be deallocated using "PyObject_GC_Del()". Warning: If a type adds the Py_TPFLAGS_HAVE_GC, then it *must* implement at least a "tp_traverse" handler or explicitly use one from its subclass or subclasses.When calling "PyType_Ready()" or some of the APIs that indirectly call it like "PyType_FromSpecWithBases()" or "PyType_FromSpec()" the interpreter will automatically populate the "tp_flags", "tp_traverse" and "tp_clear" fields if the type inherits from a class that implements the garbage collector protocol and the child class does *not* include the "Py_TPFLAGS_HAVE_GC" flag. PyObject_GC_New(TYPE, typeobj) Analogous to "PyObject_New" but for container objects with the "Py_TPFLAGS_HAVE_GC" flag set. PyObject_GC_NewVar(TYPE, typeobj, size) Analogous to "PyObject_NewVar" but for container objects with the "Py_TPFLAGS_HAVE_GC" flag set. PyObject *PyUnstable_Object_GC_NewWithExtraData(PyTypeObject *type, size_t extra_size) *This is Unstable API. It may change without warning in minor releases.* Analogous to "PyObject_GC_New" but allocates *extra_size* bytes at the end of the object (at offset "tp_basicsize"). The allocated memory is initialized to zeros, except for the "Python object header". The extra data will be deallocated with the object, but otherwise it is not managed by Python. Warning: The function is marked as unstable because the final mechanism for reserving extra data after an instance is not yet decided. For allocating a variable number of fields, prefer using "PyVarObject" and "tp_itemsize" instead. Added in version 3.12. PyObject_GC_Resize(TYPE, op, newsize) Resize an object allocated by "PyObject_NewVar". Returns the resized object of type "TYPE*" (refers to any C type) or "NULL" on failure. *op* must be of type PyVarObject* and must not be tracked by the collector yet. *newsize* must be of type "Py_ssize_t". void PyObject_GC_Track(PyObject *op) * Part of the Stable ABI.* Adds the object *op* to the set of container objects tracked by the collector. The collector can run at unexpected times so objects must be valid while being tracked. This should be called once all the fields followed by the "tp_traverse" handler become valid, usually near the end of the constructor. int PyObject_IS_GC(PyObject *obj) Returns non-zero if the object implements the garbage collector protocol, otherwise returns 0. The object cannot be tracked by the garbage collector if this function returns 0. int PyObject_GC_IsTracked(PyObject *op) * Part of the Stable ABI since version 3.9.* Returns 1 if the object type of *op* implements the GC protocol and *op* is being currently tracked by the garbage collector and 0 otherwise. This is analogous to the Python function "gc.is_tracked()". Added in version 3.9. int PyObject_GC_IsFinalized(PyObject *op) * Part of the Stable ABI since version 3.9.* Returns 1 if the object type of *op* implements the GC protocol and *op* has been already finalized by the garbage collector and 0 otherwise. This is analogous to the Python function "gc.is_finalized()". Added in version 3.9. void PyObject_GC_Del(void *op) * Part of the Stable ABI.* Releases memory allocated to an object using "PyObject_GC_New" or "PyObject_GC_NewVar". void PyObject_GC_UnTrack(void *op) * Part of the Stable ABI.* Remove the object *op* from the set of container objects tracked by the collector. Note that "PyObject_GC_Track()" can be called again on this object to add it back to the set of tracked objects. The deallocator ("tp_dealloc" handler) should call this for the object before any of the fields used by the "tp_traverse" handler become invalid. Changed in version 3.8: The "_PyObject_GC_TRACK()" and "_PyObject_GC_UNTRACK()" macros have been removed from the public C API. The "tp_traverse" handler accepts a function parameter of this type: typedef int (*visitproc)(PyObject *object, void *arg) * Part of the Stable ABI.* Type of the visitor function passed to the "tp_traverse" handler. The function should be called with an object to traverse as *object* and the third parameter to the "tp_traverse" handler as *arg*. The Python core uses several visitor functions to implement cyclic garbage detection; it’s not expected that users will need to write their own visitor functions. The "tp_traverse" handler must have the following type: typedef int (*traverseproc)(PyObject *self, visitproc visit, void *arg) * Part of the Stable ABI.* Traversal function for a container object. Implementations must call the *visit* function for each object directly contained by *self*, with the parameters to *visit* being the contained object and the *arg* value passed to the handler. The *visit* function must not be called with a "NULL" object argument. If *visit* returns a non-zero value that value should be returned immediately. To simplify writing "tp_traverse" handlers, a "Py_VISIT()" macro is provided. In order to use this macro, the "tp_traverse" implementation must name its arguments exactly *visit* and *arg*: Py_VISIT(o) If the PyObject* *o* is not "NULL", call the *visit* callback, with arguments *o* and *arg*. If *visit* returns a non-zero value, then return it. Using this macro, "tp_traverse" handlers look like: static int my_traverse(Noddy *self, visitproc visit, void *arg) { Py_VISIT(self->foo); Py_VISIT(self->bar); return 0; } The "tp_clear" handler must be of the "inquiry" type, or "NULL" if the object is immutable. typedef int (*inquiry)(PyObject *self) * Part of the Stable ABI.* Drop references that may have created reference cycles. Immutable objects do not have to define this method since they can never directly create reference cycles. Note that the object must still be valid after calling this method (don’t just call "Py_DECREF()" on a reference). The collector will call this method if it detects that this object is involved in a reference cycle. Controlling the Garbage Collector State ======================================= The C-API provides the following functions for controlling garbage collection runs. Py_ssize_t PyGC_Collect(void) * Part of the Stable ABI.* Perform a full garbage collection, if the garbage collector is enabled. (Note that "gc.collect()" runs it unconditionally.) Returns the number of collected + unreachable objects which cannot be collected. If the garbage collector is disabled or already collecting, returns "0" immediately. Errors during garbage collection are passed to "sys.unraisablehook". This function does not raise exceptions. int PyGC_Enable(void) * Part of the Stable ABI since version 3.10.* Enable the garbage collector: similar to "gc.enable()". Returns the previous state, 0 for disabled and 1 for enabled. Added in version 3.10. int PyGC_Disable(void) * Part of the Stable ABI since version 3.10.* Disable the garbage collector: similar to "gc.disable()". Returns the previous state, 0 for disabled and 1 for enabled. Added in version 3.10. int PyGC_IsEnabled(void) * Part of the Stable ABI since version 3.10.* Query the state of the garbage collector: similar to "gc.isenabled()". Returns the current state, 0 for disabled and 1 for enabled. Added in version 3.10. Querying Garbage Collector State ================================ The C-API provides the following interface for querying information about the garbage collector. void PyUnstable_GC_VisitObjects(gcvisitobjects_t callback, void *arg) *This is Unstable API. It may change without warning in minor releases.* Run supplied *callback* on all live GC-capable objects. *arg* is passed through to all invocations of *callback*. Warning: If new objects are (de)allocated by the callback it is undefined if they will be visited.Garbage collection is disabled during operation. Explicitly running a collection in the callback may lead to undefined behaviour e.g. visiting the same objects multiple times or not at all. Added in version 3.12. typedef int (*gcvisitobjects_t)(PyObject *object, void *arg) Type of the visitor function to be passed to "PyUnstable_GC_VisitObjects()". *arg* is the same as the *arg* passed to "PyUnstable_GC_VisitObjects". Return "1" to continue iteration, return "0" to stop iteration. Other return values are reserved for now so behavior on returning anything else is undefined. Added in version 3.12. Generator Objects ***************** Generator objects are what Python uses to implement generator iterators. They are normally created by iterating over a function that yields values, rather than explicitly calling "PyGen_New()" or "PyGen_NewWithQualName()". type PyGenObject The C structure used for generator objects. PyTypeObject PyGen_Type The type object corresponding to generator objects. int PyGen_Check(PyObject *ob) Return true if *ob* is a generator object; *ob* must not be "NULL". This function always succeeds. int PyGen_CheckExact(PyObject *ob) Return true if *ob*’s type is "PyGen_Type"; *ob* must not be "NULL". This function always succeeds. PyObject *PyGen_New(PyFrameObject *frame) *Return value: New reference.* Create and return a new generator object based on the *frame* object. A reference to *frame* is stolen by this function. The argument must not be "NULL". PyObject *PyGen_NewWithQualName(PyFrameObject *frame, PyObject *name, PyObject *qualname) *Return value: New reference.* Create and return a new generator object based on the *frame* object, with "__name__" and "__qualname__" set to *name* and *qualname*. A reference to *frame* is stolen by this function. The *frame* argument must not be "NULL". PyHash API ********** See also the "PyTypeObject.tp_hash" member and Hashing of numeric types. type Py_hash_t Hash value type: signed integer. Added in version 3.2. type Py_uhash_t Hash value type: unsigned integer. Added in version 3.2. PyHASH_MODULUS The Mersenne prime "P = 2**n -1", used for numeric hash scheme. Added in version 3.13. PyHASH_BITS The exponent "n" of "P" in "PyHASH_MODULUS". Added in version 3.13. PyHASH_MULTIPLIER Prime multiplier used in string and various other hashes. Added in version 3.13. PyHASH_INF The hash value returned for a positive infinity. Added in version 3.13. PyHASH_IMAG The multiplier used for the imaginary part of a complex number. Added in version 3.13. type PyHash_FuncDef Hash function definition used by "PyHash_GetFuncDef()". const char *name Hash function name (UTF-8 encoded string). const int hash_bits Internal size of the hash value in bits. const int seed_bits Size of seed input in bits. Added in version 3.4. PyHash_FuncDef *PyHash_GetFuncDef(void) Get the hash function definition. See also: **PEP 456** “Secure and interchangeable hash algorithm”. Added in version 3.4. Py_hash_t Py_HashPointer(const void *ptr) Hash a pointer value: process the pointer value as an integer (cast it to "uintptr_t" internally). The pointer is not dereferenced. The function cannot fail: it cannot return "-1". Added in version 3.13. Py_hash_t PyObject_GenericHash(PyObject *obj) Generic hashing function that is meant to be put into a type object’s "tp_hash" slot. Its result only depends on the object’s identity. **CPython implementation detail:** In CPython, it is equivalent to "Py_HashPointer()". Added in version 3.13. Importing Modules ***************** PyObject *PyImport_ImportModule(const char *name) *Return value: New reference.** Part of the Stable ABI.* This is a wrapper around "PyImport_Import()" which takes a const char* as an argument instead of a PyObject*. PyObject *PyImport_ImportModuleNoBlock(const char *name) *Return value: New reference.** Part of the Stable ABI.* This function is a deprecated alias of "PyImport_ImportModule()". Changed in version 3.3: This function used to fail immediately when the import lock was held by another thread. In Python 3.3 though, the locking scheme switched to per-module locks for most purposes, so this function’s special behaviour isn’t needed anymore. Deprecated since version 3.13, will be removed in version 3.15: Use "PyImport_ImportModule()" instead. PyObject *PyImport_ImportModuleEx(const char *name, PyObject *globals, PyObject *locals, PyObject *fromlist) *Return value: New reference.* Import a module. This is best described by referring to the built- in Python function "__import__()". The return value is a new reference to the imported module or top- level package, or "NULL" with an exception set on failure. Like for "__import__()", the return value when a submodule of a package was requested is normally the top-level package, unless a non-empty *fromlist* was given. Failing imports remove incomplete module objects, like with "PyImport_ImportModule()". PyObject *PyImport_ImportModuleLevelObject(PyObject *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Import a module. This is best described by referring to the built- in Python function "__import__()", as the standard "__import__()" function calls this function directly. The return value is a new reference to the imported module or top- level package, or "NULL" with an exception set on failure. Like for "__import__()", the return value when a submodule of a package was requested is normally the top-level package, unless a non-empty *fromlist* was given. Added in version 3.3. PyObject *PyImport_ImportModuleLevel(const char *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level) *Return value: New reference.** Part of the Stable ABI.* Similar to "PyImport_ImportModuleLevelObject()", but the name is a UTF-8 encoded string instead of a Unicode object. Changed in version 3.3: Negative values for *level* are no longer accepted. PyObject *PyImport_Import(PyObject *name) *Return value: New reference.** Part of the Stable ABI.* This is a higher-level interface that calls the current “import hook function” (with an explicit *level* of 0, meaning absolute import). It invokes the "__import__()" function from the "__builtins__" of the current globals. This means that the import is done using whatever import hooks are installed in the current environment. This function always uses absolute imports. PyObject *PyImport_ReloadModule(PyObject *m) *Return value: New reference.** Part of the Stable ABI.* Reload a module. Return a new reference to the reloaded module, or "NULL" with an exception set on failure (the module still exists in this case). PyObject *PyImport_AddModuleRef(const char *name) *Return value: New reference.** Part of the Stable ABI since version 3.13.* Return the module object corresponding to a module name. The *name* argument may be of the form "package.module". First check the modules dictionary if there’s one there, and if not, create a new one and insert it in the modules dictionary. Return a *strong reference* to the module on success. Return "NULL" with an exception set on failure. The module name *name* is decoded from UTF-8. This function does not load or import the module; if the module wasn’t already loaded, you will get an empty module object. Use "PyImport_ImportModule()" or one of its variants to import a module. Package structures implied by a dotted name for *name* are not created if not already present. Added in version 3.13. PyObject *PyImport_AddModuleObject(PyObject *name) *Return value: Borrowed reference.** Part of the Stable ABI since version 3.7.* Similar to "PyImport_AddModuleRef()", but return a *borrowed reference* and *name* is a Python "str" object. Added in version 3.3. PyObject *PyImport_AddModule(const char *name) *Return value: Borrowed reference.** Part of the Stable ABI.* Similar to "PyImport_AddModuleRef()", but return a *borrowed reference*. PyObject *PyImport_ExecCodeModule(const char *name, PyObject *co) *Return value: New reference.** Part of the Stable ABI.* Given a module name (possibly of the form "package.module") and a code object read from a Python bytecode file or obtained from the built-in function "compile()", load the module. Return a new reference to the module object, or "NULL" with an exception set if an error occurred. *name* is removed from "sys.modules" in error cases, even if *name* was already in "sys.modules" on entry to "PyImport_ExecCodeModule()". Leaving incompletely initialized modules in "sys.modules" is dangerous, as imports of such modules have no way to know that the module object is an unknown (and probably damaged with respect to the module author’s intents) state. The module’s "__spec__" and "__loader__" will be set, if not set already, with the appropriate values. The spec’s loader will be set to the module’s "__loader__" (if set) and to an instance of "SourceFileLoader" otherwise. The module’s "__file__" attribute will be set to the code object’s "co_filename". If applicable, "__cached__" will also be set. This function will reload the module if it was already imported. See "PyImport_ReloadModule()" for the intended way to reload a module. If *name* points to a dotted name of the form "package.module", any package structures not already created will still not be created. See also "PyImport_ExecCodeModuleEx()" and "PyImport_ExecCodeModuleWithPathnames()". Changed in version 3.12: The setting of "__cached__" and "__loader__" is deprecated. See "ModuleSpec" for alternatives. PyObject *PyImport_ExecCodeModuleEx(const char *name, PyObject *co, const char *pathname) *Return value: New reference.** Part of the Stable ABI.* Like "PyImport_ExecCodeModule()", but the "__file__" attribute of the module object is set to *pathname* if it is non-"NULL". See also "PyImport_ExecCodeModuleWithPathnames()". PyObject *PyImport_ExecCodeModuleObject(PyObject *name, PyObject *co, PyObject *pathname, PyObject *cpathname) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Like "PyImport_ExecCodeModuleEx()", but the "__cached__" attribute of the module object is set to *cpathname* if it is non-"NULL". Of the three functions, this is the preferred one to use. Added in version 3.3. Changed in version 3.12: Setting "__cached__" is deprecated. See "ModuleSpec" for alternatives. PyObject *PyImport_ExecCodeModuleWithPathnames(const char *name, PyObject *co, const char *pathname, const char *cpathname) *Return value: New reference.** Part of the Stable ABI.* Like "PyImport_ExecCodeModuleObject()", but *name*, *pathname* and *cpathname* are UTF-8 encoded strings. Attempts are also made to figure out what the value for *pathname* should be from *cpathname* if the former is set to "NULL". Added in version 3.2. Changed in version 3.3: Uses "imp.source_from_cache()" in calculating the source path if only the bytecode path is provided. Changed in version 3.12: No longer uses the removed "imp" module. long PyImport_GetMagicNumber() * Part of the Stable ABI.* Return the magic number for Python bytecode files (a.k.a. ".pyc" file). The magic number should be present in the first four bytes of the bytecode file, in little-endian byte order. Returns "-1" on error. Changed in version 3.3: Return value of "-1" upon failure. const char *PyImport_GetMagicTag() * Part of the Stable ABI.* Return the magic tag string for **PEP 3147** format Python bytecode file names. Keep in mind that the value at "sys.implementation.cache_tag" is authoritative and should be used instead of this function. Added in version 3.2. PyObject *PyImport_GetModuleDict() *Return value: Borrowed reference.** Part of the Stable ABI.* Return the dictionary used for the module administration (a.k.a. "sys.modules"). Note that this is a per-interpreter variable. PyObject *PyImport_GetModule(PyObject *name) *Return value: New reference.** Part of the Stable ABI since version 3.8.* Return the already imported module with the given name. If the module has not been imported yet then returns "NULL" but does not set an error. Returns "NULL" and sets an error if the lookup failed. Added in version 3.7. PyObject *PyImport_GetImporter(PyObject *path) *Return value: New reference.** Part of the Stable ABI.* Return a finder object for a "sys.path"/"pkg.__path__" item *path*, possibly by fetching it from the "sys.path_importer_cache" dict. If it wasn’t yet cached, traverse "sys.path_hooks" until a hook is found that can handle the path item. Return "None" if no hook could; this tells our caller that the *path based finder* could not find a finder for this path item. Cache the result in "sys.path_importer_cache". Return a new reference to the finder object. int PyImport_ImportFrozenModuleObject(PyObject *name) * Part of the Stable ABI since version 3.7.* Load a frozen module named *name*. Return "1" for success, "0" if the module is not found, and "-1" with an exception set if the initialization failed. To access the imported module on a successful load, use "PyImport_ImportModule()". (Note the misnomer — this function would reload the module if it was already imported.) Added in version 3.3. Changed in version 3.4: The "__file__" attribute is no longer set on the module. int PyImport_ImportFrozenModule(const char *name) * Part of the Stable ABI.* Similar to "PyImport_ImportFrozenModuleObject()", but the name is a UTF-8 encoded string instead of a Unicode object. struct _frozen This is the structure type definition for frozen module descriptors, as generated by the **freeze** utility (see "Tools/freeze/" in the Python source distribution). Its definition, found in "Include/import.h", is: struct _frozen { const char *name; const unsigned char *code; int size; bool is_package; }; Changed in version 3.11: The new "is_package" field indicates whether the module is a package or not. This replaces setting the "size" field to a negative value. const struct _frozen *PyImport_FrozenModules This pointer is initialized to point to an array of "_frozen" records, terminated by one whose members are all "NULL" or zero. When a frozen module is imported, it is searched in this table. Third-party code could play tricks with this to provide a dynamically created collection of frozen modules. int PyImport_AppendInittab(const char *name, PyObject *(*initfunc)(void)) * Part of the Stable ABI.* Add a single module to the existing table of built-in modules. This is a convenience wrapper around "PyImport_ExtendInittab()", returning "-1" if the table could not be extended. The new module can be imported by the name *name*, and uses the function *initfunc* as the initialization function called on the first attempted import. This should be called before "Py_Initialize()". struct _inittab Structure describing a single entry in the list of built-in modules. Programs which embed Python may use an array of these structures in conjunction with "PyImport_ExtendInittab()" to provide additional built-in modules. The structure consists of two members: const char *name The module name, as an ASCII encoded string. PyObject *(*initfunc)(void) Initialization function for a module built into the interpreter. int PyImport_ExtendInittab(struct _inittab *newtab) Add a collection of modules to the table of built-in modules. The *newtab* array must end with a sentinel entry which contains "NULL" for the "name" field; failure to provide the sentinel value can result in a memory fault. Returns "0" on success or "-1" if insufficient memory could be allocated to extend the internal table. In the event of failure, no modules are added to the internal table. This must be called before "Py_Initialize()". If Python is initialized multiple times, "PyImport_AppendInittab()" or "PyImport_ExtendInittab()" must be called before each Python initialization. Python/C API Reference Manual ***************************** This manual documents the API used by C and C++ programmers who want to write extension modules or embed Python. It is a companion to Extending and Embedding the Python Interpreter, which describes the general principles of extension writing but does not document the API functions in detail. * Introduction * Coding standards * Include Files * Useful macros * Objects, Types and Reference Counts * Exceptions * Embedding Python * Debugging Builds * Recommended third party tools * C API Stability * Unstable C API * Stable Application Binary Interface * Platform Considerations * Contents of Limited API * The Very High Level Layer * Reference Counting * Exception Handling * Printing and clearing * Raising exceptions * Issuing warnings * Querying the error indicator * Signal Handling * Exception Classes * Exception Objects * Unicode Exception Objects * Recursion Control * Standard Exceptions * Standard Warning Categories * Utilities * Operating System Utilities * System Functions * Process Control * Importing Modules * Data marshalling support * Parsing arguments and building values * String conversion and formatting * PyHash API * Reflection * Codec registry and support functions * PyTime C API * Support for Perf Maps * Abstract Objects Layer * Object Protocol * Call Protocol * Number Protocol * Sequence Protocol * Mapping Protocol * Iterator Protocol * Buffer Protocol * Concrete Objects Layer * Fundamental Objects * Numeric Objects * Sequence Objects * Container Objects * Function Objects * Other Objects * Initialization, Finalization, and Threads * Before Python Initialization * Global configuration variables * Initializing and finalizing the interpreter * Process-wide parameters * Thread State and the Global Interpreter Lock * Sub-interpreter support * Asynchronous Notifications * Profiling and Tracing * Reference tracing * Advanced Debugger Support * Thread Local Storage Support * Synchronization Primitives * Python Initialization Configuration * Example * PyWideStringList * PyStatus * PyPreConfig * Preinitialize Python with PyPreConfig * PyConfig * Initialization with PyConfig * Isolated Configuration * Python Configuration * Python Path Configuration * Py_GetArgcArgv() * Multi-Phase Initialization Private Provisional API * Memory Management * Overview * Allocator Domains * Raw Memory Interface * Memory Interface * Object allocators * Default Memory Allocators * Customize Memory Allocators * Debug hooks on the Python memory allocators * The pymalloc allocator * The mimalloc allocator * tracemalloc C API * Examples * Object Implementation Support * Allocating Objects on the Heap * Common Object Structures * Type Object Structures * Supporting Cyclic Garbage Collection * API and ABI Versioning * Monitoring C API * Generating Execution Events * Managing the Monitoring State Initialization, Finalization, and Threads ***************************************** See Python Initialization Configuration for details on how to configure the interpreter prior to initialization. Before Python Initialization ============================ In an application embedding Python, the "Py_Initialize()" function must be called before using any other Python/C API functions; with the exception of a few functions and the global configuration variables. The following functions can be safely called before Python is initialized: * Functions that initialize the interpreter: * "Py_Initialize()" * "Py_InitializeEx()" * "Py_InitializeFromConfig()" * "Py_BytesMain()" * "Py_Main()" * the runtime pre-initialization functions covered in Python Initialization Configuration * Configuration functions: * "PyImport_AppendInittab()" * "PyImport_ExtendInittab()" * "PyInitFrozenExtensions()" * "PyMem_SetAllocator()" * "PyMem_SetupDebugHooks()" * "PyObject_SetArenaAllocator()" * "Py_SetProgramName()" * "Py_SetPythonHome()" * "PySys_ResetWarnOptions()" * the configuration functions covered in Python Initialization Configuration * Informative functions: * "Py_IsInitialized()" * "PyMem_GetAllocator()" * "PyObject_GetArenaAllocator()" * "Py_GetBuildInfo()" * "Py_GetCompiler()" * "Py_GetCopyright()" * "Py_GetPlatform()" * "Py_GetVersion()" * "Py_IsInitialized()" * Utilities: * "Py_DecodeLocale()" * the status reporting and utility functions covered in Python Initialization Configuration * Memory allocators: * "PyMem_RawMalloc()" * "PyMem_RawRealloc()" * "PyMem_RawCalloc()" * "PyMem_RawFree()" * Synchronization: * "PyMutex_Lock()" * "PyMutex_Unlock()" Note: Despite their apparent similarity to some of the functions listed above, the following functions **should not be called** before the interpreter has been initialized: "Py_EncodeLocale()", "Py_GetPath()", "Py_GetPrefix()", "Py_GetExecPrefix()", "Py_GetProgramFullPath()", "Py_GetPythonHome()", "Py_GetProgramName()", "PyEval_InitThreads()", and "Py_RunMain()". Global configuration variables ============================== Python has variables for the global configuration to control different features and options. By default, these flags are controlled by command line options. When a flag is set by an option, the value of the flag is the number of times that the option was set. For example, "-b" sets "Py_BytesWarningFlag" to 1 and "-bb" sets "Py_BytesWarningFlag" to 2. int Py_BytesWarningFlag This API is kept for backward compatibility: setting "PyConfig.bytes_warning" should be used instead, see Python Initialization Configuration. Issue a warning when comparing "bytes" or "bytearray" with "str" or "bytes" with "int". Issue an error if greater or equal to "2". Set by the "-b" option. Deprecated since version 3.12, will be removed in version 3.14. int Py_DebugFlag This API is kept for backward compatibility: setting "PyConfig.parser_debug" should be used instead, see Python Initialization Configuration. Turn on parser debugging output (for expert only, depending on compilation options). Set by the "-d" option and the "PYTHONDEBUG" environment variable. Deprecated since version 3.12, will be removed in version 3.14. int Py_DontWriteBytecodeFlag This API is kept for backward compatibility: setting "PyConfig.write_bytecode" should be used instead, see Python Initialization Configuration. If set to non-zero, Python won’t try to write ".pyc" files on the import of source modules. Set by the "-B" option and the "PYTHONDONTWRITEBYTECODE" environment variable. Deprecated since version 3.12, will be removed in version 3.14. int Py_FrozenFlag This API is kept for backward compatibility: setting "PyConfig.pathconfig_warnings" should be used instead, see Python Initialization Configuration. Suppress error messages when calculating the module search path in "Py_GetPath()". Private flag used by "_freeze_module" and "frozenmain" programs. Deprecated since version 3.12, will be removed in version 3.14. int Py_HashRandomizationFlag This API is kept for backward compatibility: setting "PyConfig.hash_seed" and "PyConfig.use_hash_seed" should be used instead, see Python Initialization Configuration. Set to "1" if the "PYTHONHASHSEED" environment variable is set to a non-empty string. If the flag is non-zero, read the "PYTHONHASHSEED" environment variable to initialize the secret hash seed. Deprecated since version 3.12, will be removed in version 3.14. int Py_IgnoreEnvironmentFlag This API is kept for backward compatibility: setting "PyConfig.use_environment" should be used instead, see Python Initialization Configuration. Ignore all "PYTHON*" environment variables, e.g. "PYTHONPATH" and "PYTHONHOME", that might be set. Set by the "-E" and "-I" options. Deprecated since version 3.12, will be removed in version 3.14. int Py_InspectFlag This API is kept for backward compatibility: setting "PyConfig.inspect" should be used instead, see Python Initialization Configuration. When a script is passed as first argument or the "-c" option is used, enter interactive mode after executing the script or the command, even when "sys.stdin" does not appear to be a terminal. Set by the "-i" option and the "PYTHONINSPECT" environment variable. Deprecated since version 3.12, will be removed in version 3.14. int Py_InteractiveFlag This API is kept for backward compatibility: setting "PyConfig.interactive" should be used instead, see Python Initialization Configuration. Set by the "-i" option. Deprecated since version 3.12, will be removed in version 3.15. int Py_IsolatedFlag This API is kept for backward compatibility: setting "PyConfig.isolated" should be used instead, see Python Initialization Configuration. Run Python in isolated mode. In isolated mode "sys.path" contains neither the script’s directory nor the user’s site-packages directory. Set by the "-I" option. Added in version 3.4. Deprecated since version 3.12, will be removed in version 3.14. int Py_LegacyWindowsFSEncodingFlag This API is kept for backward compatibility: setting "PyPreConfig.legacy_windows_fs_encoding" should be used instead, see Python Initialization Configuration. If the flag is non-zero, use the "mbcs" encoding with "replace" error handler, instead of the UTF-8 encoding with "surrogatepass" error handler, for the *filesystem encoding and error handler*. Set to "1" if the "PYTHONLEGACYWINDOWSFSENCODING" environment variable is set to a non-empty string. See **PEP 529** for more details. Availability: Windows. Deprecated since version 3.12, will be removed in version 3.14. int Py_LegacyWindowsStdioFlag This API is kept for backward compatibility: setting "PyConfig.legacy_windows_stdio" should be used instead, see Python Initialization Configuration. If the flag is non-zero, use "io.FileIO" instead of "io._WindowsConsoleIO" for "sys" standard streams. Set to "1" if the "PYTHONLEGACYWINDOWSSTDIO" environment variable is set to a non-empty string. See **PEP 528** for more details. Availability: Windows. Deprecated since version 3.12, will be removed in version 3.14. int Py_NoSiteFlag This API is kept for backward compatibility: setting "PyConfig.site_import" should be used instead, see Python Initialization Configuration. Disable the import of the module "site" and the site-dependent manipulations of "sys.path" that it entails. Also disable these manipulations if "site" is explicitly imported later (call "site.main()" if you want them to be triggered). Set by the "-S" option. Deprecated since version 3.12, will be removed in version 3.14. int Py_NoUserSiteDirectory This API is kept for backward compatibility: setting "PyConfig.user_site_directory" should be used instead, see Python Initialization Configuration. Don’t add the "user site-packages directory" to "sys.path". Set by the "-s" and "-I" options, and the "PYTHONNOUSERSITE" environment variable. Deprecated since version 3.12, will be removed in version 3.14. int Py_OptimizeFlag This API is kept for backward compatibility: setting "PyConfig.optimization_level" should be used instead, see Python Initialization Configuration. Set by the "-O" option and the "PYTHONOPTIMIZE" environment variable. Deprecated since version 3.12, will be removed in version 3.14. int Py_QuietFlag This API is kept for backward compatibility: setting "PyConfig.quiet" should be used instead, see Python Initialization Configuration. Don’t display the copyright and version messages even in interactive mode. Set by the "-q" option. Added in version 3.2. Deprecated since version 3.12, will be removed in version 3.14. int Py_UnbufferedStdioFlag This API is kept for backward compatibility: setting "PyConfig.buffered_stdio" should be used instead, see Python Initialization Configuration. Force the stdout and stderr streams to be unbuffered. Set by the "-u" option and the "PYTHONUNBUFFERED" environment variable. Deprecated since version 3.12, will be removed in version 3.14. int Py_VerboseFlag This API is kept for backward compatibility: setting "PyConfig.verbose" should be used instead, see Python Initialization Configuration. Print a message each time a module is initialized, showing the place (filename or built-in module) from which it is loaded. If greater or equal to "2", print a message for each file that is checked for when searching for a module. Also provides information on module cleanup at exit. Set by the "-v" option and the "PYTHONVERBOSE" environment variable. Deprecated since version 3.12, will be removed in version 3.14. Initializing and finalizing the interpreter =========================================== void Py_Initialize() * Part of the Stable ABI.* Initialize the Python interpreter. In an application embedding Python, this should be called before using any other Python/C API functions; see Before Python Initialization for the few exceptions. This initializes the table of loaded modules ("sys.modules"), and creates the fundamental modules "builtins", "__main__" and "sys". It also initializes the module search path ("sys.path"). It does not set "sys.argv"; use the Python Initialization Configuration API for that. This is a no-op when called for a second time (without calling "Py_FinalizeEx()" first). There is no return value; it is a fatal error if the initialization fails. Use "Py_InitializeFromConfig()" to customize the Python Initialization Configuration. Note: On Windows, changes the console mode from "O_TEXT" to "O_BINARY", which will also affect non-Python uses of the console using the C Runtime. void Py_InitializeEx(int initsigs) * Part of the Stable ABI.* This function works like "Py_Initialize()" if *initsigs* is "1". If *initsigs* is "0", it skips initialization registration of signal handlers, which may be useful when CPython is embedded as part of a larger application. Use "Py_InitializeFromConfig()" to customize the Python Initialization Configuration. PyStatus Py_InitializeFromConfig(const PyConfig *config) Initialize Python from *config* configuration, as described in Initialization with PyConfig. See the Python Initialization Configuration section for details on pre-initializing the interpreter, populating the runtime configuration structure, and querying the returned status structure. int Py_IsInitialized() * Part of the Stable ABI.* Return true (nonzero) when the Python interpreter has been initialized, false (zero) if not. After "Py_FinalizeEx()" is called, this returns false until "Py_Initialize()" is called again. int Py_IsFinalizing() * Part of the Stable ABI since version 3.13.* Return true (non-zero) if the main Python interpreter is *shutting down*. Return false (zero) otherwise. Added in version 3.13. int Py_FinalizeEx() * Part of the Stable ABI since version 3.6.* Undo all initializations made by "Py_Initialize()" and subsequent use of Python/C API functions, and destroy all sub-interpreters (see "Py_NewInterpreter()" below) that were created and not yet destroyed since the last call to "Py_Initialize()". Ideally, this frees all memory allocated by the Python interpreter. This is a no-op when called for a second time (without calling "Py_Initialize()" again first). Since this is the reverse of "Py_Initialize()", it should be called in the same thread with the same interpreter active. That means the main thread and the main interpreter. This should never be called while "Py_RunMain()" is running. Normally the return value is "0". If there were errors during finalization (flushing buffered data), "-1" is returned. This function is provided for a number of reasons. An embedding application might want to restart Python without having to restart the application itself. An application that has loaded the Python interpreter from a dynamically loadable library (or DLL) might want to free all memory allocated by Python before unloading the DLL. During a hunt for memory leaks in an application a developer might want to free all memory allocated by Python before exiting from the application. **Bugs and caveats:** The destruction of modules and objects in modules is done in random order; this may cause destructors ("__del__()" methods) to fail when they depend on other objects (even functions) or modules. Dynamically loaded extension modules loaded by Python are not unloaded. Small amounts of memory allocated by the Python interpreter may not be freed (if you find a leak, please report it). Memory tied up in circular references between objects is not freed. Some memory allocated by extension modules may not be freed. Some extensions may not work properly if their initialization routine is called more than once; this can happen if an application calls "Py_Initialize()" and "Py_FinalizeEx()" more than once. Raises an auditing event "cpython._PySys_ClearAuditHooks" with no arguments. Added in version 3.6. void Py_Finalize() * Part of the Stable ABI.* This is a backwards-compatible version of "Py_FinalizeEx()" that disregards the return value. int Py_BytesMain(int argc, char **argv) * Part of the Stable ABI since version 3.8.* Similar to "Py_Main()" but *argv* is an array of bytes strings, allowing the calling application to delegate the text decoding step to the CPython runtime. Added in version 3.8. int Py_Main(int argc, wchar_t **argv) * Part of the Stable ABI.* The main program for the standard interpreter, encapsulating a full initialization/finalization cycle, as well as additional behaviour to implement reading configurations settings from the environment and command line, and then executing "__main__" in accordance with Command line. This is made available for programs which wish to support the full CPython command line interface, rather than just embedding a Python runtime in a larger application. The *argc* and *argv* parameters are similar to those which are passed to a C program’s "main()" function, except that the *argv* entries are first converted to "wchar_t" using "Py_DecodeLocale()". It is also important to note that the argument list entries may be modified to point to strings other than those passed in (however, the contents of the strings pointed to by the argument list are not modified). The return value is "2" if the argument list does not represent a valid Python command line, and otherwise the same as "Py_RunMain()". In terms of the CPython runtime configuration APIs documented in the runtime configuration section (and without accounting for error handling), "Py_Main" is approximately equivalent to: PyConfig config; PyConfig_InitPythonConfig(&config); PyConfig_SetArgv(&config, argc, argv); Py_InitializeFromConfig(&config); PyConfig_Clear(&config); Py_RunMain(); In normal usage, an embedding application will call this function *instead* of calling "Py_Initialize()", "Py_InitializeEx()" or "Py_InitializeFromConfig()" directly, and all settings will be applied as described elsewhere in this documentation. If this function is instead called *after* a preceding runtime initialization API call, then exactly which environmental and command line configuration settings will be updated is version dependent (as it depends on which settings correctly support being modified after they have already been set once when the runtime was first initialized). int Py_RunMain(void) Executes the main module in a fully configured CPython runtime. Executes the command ("PyConfig.run_command"), the script ("PyConfig.run_filename") or the module ("PyConfig.run_module") specified on the command line or in the configuration. If none of these values are set, runs the interactive Python prompt (REPL) using the "__main__" module’s global namespace. If "PyConfig.inspect" is not set (the default), the return value will be "0" if the interpreter exits normally (that is, without raising an exception), the exit status of an unhandled "SystemExit", or "1" for any other unhandled exception. If "PyConfig.inspect" is set (such as when the "-i" option is used), rather than returning when the interpreter exits, execution will instead resume in an interactive Python prompt (REPL) using the "__main__" module’s global namespace. If the interpreter exited with an exception, it is immediately raised in the REPL session. The function return value is then determined by the way the *REPL session* terminates: "0", "1", or the status of a "SystemExit", as specified above. This function always finalizes the Python interpreter before it returns. See Python Configuration for an example of a customized Python that always runs in isolated mode using "Py_RunMain()". int PyUnstable_AtExit(PyInterpreterState *interp, void (*func)(void*), void *data) *This is Unstable API. It may change without warning in minor releases.* Register an "atexit" callback for the target interpreter *interp*. This is similar to "Py_AtExit()", but takes an explicit interpreter and data pointer for the callback. The *GIL* must be held for *interp*. Added in version 3.13. Process-wide parameters ======================= void Py_SetProgramName(const wchar_t *name) * Part of the Stable ABI.* This API is kept for backward compatibility: setting "PyConfig.program_name" should be used instead, see Python Initialization Configuration. This function should be called before "Py_Initialize()" is called for the first time, if it is called at all. It tells the interpreter the value of the "argv[0]" argument to the "main()" function of the program (converted to wide characters). This is used by "Py_GetPath()" and some other functions below to find the Python run-time libraries relative to the interpreter executable. The default value is "'python'". The argument should point to a zero-terminated wide character string in static storage whose contents will not change for the duration of the program’s execution. No code in the Python interpreter will change the contents of this storage. Use "Py_DecodeLocale()" to decode a bytes string to get a wchar_t* string. Deprecated since version 3.11. wchar_t *Py_GetProgramName() * Part of the Stable ABI.* Return the program name set with "PyConfig.program_name", or the default. The returned string points into static storage; the caller should not modify its value. This function should not be called before "Py_Initialize()", otherwise it returns "NULL". Changed in version 3.10: It now returns "NULL" if called before "Py_Initialize()". Deprecated since version 3.13, will be removed in version 3.15: Get "sys.executable" instead. wchar_t *Py_GetPrefix() * Part of the Stable ABI.* Return the *prefix* for installed platform-independent files. This is derived through a number of complicated rules from the program name set with "PyConfig.program_name" and some environment variables; for example, if the program name is "'/usr/local/bin/python'", the prefix is "'/usr/local'". The returned string points into static storage; the caller should not modify its value. This corresponds to the **prefix** variable in the top-level "Makefile" and the "--prefix" argument to the **configure** script at build time. The value is available to Python code as "sys.base_prefix". It is only useful on Unix. See also the next function. This function should not be called before "Py_Initialize()", otherwise it returns "NULL". Changed in version 3.10: It now returns "NULL" if called before "Py_Initialize()". Deprecated since version 3.13, will be removed in version 3.15: Get "sys.base_prefix" instead, or "sys.prefix" if virtual environments need to be handled. wchar_t *Py_GetExecPrefix() * Part of the Stable ABI.* Return the *exec-prefix* for installed platform-*dependent* files. This is derived through a number of complicated rules from the program name set with "PyConfig.program_name" and some environment variables; for example, if the program name is "'/usr/local/bin/python'", the exec-prefix is "'/usr/local'". The returned string points into static storage; the caller should not modify its value. This corresponds to the **exec_prefix** variable in the top-level "Makefile" and the "--exec-prefix" argument to the **configure** script at build time. The value is available to Python code as "sys.base_exec_prefix". It is only useful on Unix. Background: The exec-prefix differs from the prefix when platform dependent files (such as executables and shared libraries) are installed in a different directory tree. In a typical installation, platform dependent files may be installed in the "/usr/local/plat" subtree while platform independent may be installed in "/usr/local". Generally speaking, a platform is a combination of hardware and software families, e.g. Sparc machines running the Solaris 2.x operating system are considered the same platform, but Intel machines running Solaris 2.x are another platform, and Intel machines running Linux are yet another platform. Different major revisions of the same operating system generally also form different platforms. Non-Unix operating systems are a different story; the installation strategies on those systems are so different that the prefix and exec-prefix are meaningless, and set to the empty string. Note that compiled Python bytecode files are platform independent (but not independent from the Python version by which they were compiled!). System administrators will know how to configure the **mount** or **automount** programs to share "/usr/local" between platforms while having "/usr/local/plat" be a different filesystem for each platform. This function should not be called before "Py_Initialize()", otherwise it returns "NULL". Changed in version 3.10: It now returns "NULL" if called before "Py_Initialize()". Deprecated since version 3.13, will be removed in version 3.15: Get "sys.base_exec_prefix" instead, or "sys.exec_prefix" if virtual environments need to be handled. wchar_t *Py_GetProgramFullPath() * Part of the Stable ABI.* Return the full program name of the Python executable; this is computed as a side-effect of deriving the default module search path from the program name (set by "PyConfig.program_name"). The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.executable". This function should not be called before "Py_Initialize()", otherwise it returns "NULL". Changed in version 3.10: It now returns "NULL" if called before "Py_Initialize()". Deprecated since version 3.13, will be removed in version 3.15: Get "sys.executable" instead. wchar_t *Py_GetPath() * Part of the Stable ABI.* Return the default module search path; this is computed from the program name (set by "PyConfig.program_name") and some environment variables. The returned string consists of a series of directory names separated by a platform dependent delimiter character. The delimiter character is "':'" on Unix and macOS, "';'" on Windows. The returned string points into static storage; the caller should not modify its value. The list "sys.path" is initialized with this value on interpreter startup; it can be (and usually is) modified later to change the search path for loading modules. This function should not be called before "Py_Initialize()", otherwise it returns "NULL". Changed in version 3.10: It now returns "NULL" if called before "Py_Initialize()". Deprecated since version 3.13, will be removed in version 3.15: Get "sys.path" instead. const char *Py_GetVersion() * Part of the Stable ABI.* Return the version of this Python interpreter. This is a string that looks something like "3.0a5+ (py3k:63103M, May 12 2008, 00:53:55) \n[GCC 4.2.3]" The first word (up to the first space character) is the current Python version; the first characters are the major and minor version separated by a period. The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.version". See also the "Py_Version" constant. const char *Py_GetPlatform() * Part of the Stable ABI.* Return the platform identifier for the current platform. On Unix, this is formed from the “official” name of the operating system, converted to lower case, followed by the major revision number; e.g., for Solaris 2.x, which is also known as SunOS 5.x, the value is "'sunos5'". On macOS, it is "'darwin'". On Windows, it is "'win'". The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.platform". const char *Py_GetCopyright() * Part of the Stable ABI.* Return the official copyright string for the current Python version, for example "'Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam'" The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.copyright". const char *Py_GetCompiler() * Part of the Stable ABI.* Return an indication of the compiler used to build the current Python version, in square brackets, for example: "[GCC 2.7.2.2]" The returned string points into static storage; the caller should not modify its value. The value is available to Python code as part of the variable "sys.version". const char *Py_GetBuildInfo() * Part of the Stable ABI.* Return information about the sequence number and build date and time of the current Python interpreter instance, for example "#67, Aug 1 1997, 22:34:28" The returned string points into static storage; the caller should not modify its value. The value is available to Python code as part of the variable "sys.version". void PySys_SetArgvEx(int argc, wchar_t **argv, int updatepath) * Part of the Stable ABI.* This API is kept for backward compatibility: setting "PyConfig.argv", "PyConfig.parse_argv" and "PyConfig.safe_path" should be used instead, see Python Initialization Configuration. Set "sys.argv" based on *argc* and *argv*. These parameters are similar to those passed to the program’s "main()" function with the difference that the first entry should refer to the script file to be executed rather than the executable hosting the Python interpreter. If there isn’t a script that will be run, the first entry in *argv* can be an empty string. If this function fails to initialize "sys.argv", a fatal condition is signalled using "Py_FatalError()". If *updatepath* is zero, this is all the function does. If *updatepath* is non-zero, the function also modifies "sys.path" according to the following algorithm: * If the name of an existing script is passed in "argv[0]", the absolute path of the directory where the script is located is prepended to "sys.path". * Otherwise (that is, if *argc* is "0" or "argv[0]" doesn’t point to an existing file name), an empty string is prepended to "sys.path", which is the same as prepending the current working directory ("".""). Use "Py_DecodeLocale()" to decode a bytes string to get a wchar_t* string. See also "PyConfig.orig_argv" and "PyConfig.argv" members of the Python Initialization Configuration. Note: It is recommended that applications embedding the Python interpreter for purposes other than executing a single script pass "0" as *updatepath*, and update "sys.path" themselves if desired. See **CVE 2008-5983**.On versions before 3.1.3, you can achieve the same effect by manually popping the first "sys.path" element after having called "PySys_SetArgv()", for example using: PyRun_SimpleString("import sys; sys.path.pop(0)\n"); Added in version 3.1.3. Deprecated since version 3.11. void PySys_SetArgv(int argc, wchar_t **argv) * Part of the Stable ABI.* This API is kept for backward compatibility: setting "PyConfig.argv" and "PyConfig.parse_argv" should be used instead, see Python Initialization Configuration. This function works like "PySys_SetArgvEx()" with *updatepath* set to "1" unless the **python** interpreter was started with the "-I". Use "Py_DecodeLocale()" to decode a bytes string to get a wchar_t* string. See also "PyConfig.orig_argv" and "PyConfig.argv" members of the Python Initialization Configuration. Changed in version 3.4: The *updatepath* value depends on "-I". Deprecated since version 3.11. void Py_SetPythonHome(const wchar_t *home) * Part of the Stable ABI.* This API is kept for backward compatibility: setting "PyConfig.home" should be used instead, see Python Initialization Configuration. Set the default “home” directory, that is, the location of the standard Python libraries. See "PYTHONHOME" for the meaning of the argument string. The argument should point to a zero-terminated character string in static storage whose contents will not change for the duration of the program’s execution. No code in the Python interpreter will change the contents of this storage. Use "Py_DecodeLocale()" to decode a bytes string to get a wchar_t* string. Deprecated since version 3.11. wchar_t *Py_GetPythonHome() * Part of the Stable ABI.* Return the default “home”, that is, the value set by "PyConfig.home", or the value of the "PYTHONHOME" environment variable if it is set. This function should not be called before "Py_Initialize()", otherwise it returns "NULL". Changed in version 3.10: It now returns "NULL" if called before "Py_Initialize()". Deprecated since version 3.13, will be removed in version 3.15: Get "PyConfig.home" or "PYTHONHOME" environment variable instead. Thread State and the Global Interpreter Lock ============================================ The Python interpreter is not fully thread-safe. In order to support multi-threaded Python programs, there’s a global lock, called the *global interpreter lock* or *GIL*, that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice. Therefore, the rule exists that only the thread that has acquired the *GIL* may operate on Python objects or call Python/C API functions. In order to emulate concurrency of execution, the interpreter regularly tries to switch threads (see "sys.setswitchinterval()"). The lock is also released around potentially blocking I/O operations like reading or writing a file, so that other Python threads can run in the meantime. The Python interpreter keeps some thread-specific bookkeeping information inside a data structure called "PyThreadState". There’s also one global variable pointing to the current "PyThreadState": it can be retrieved using "PyThreadState_Get()". Releasing the GIL from extension code ------------------------------------- Most extension code manipulating the *GIL* has the following simple structure: Save the thread state in a local variable. Release the global interpreter lock. ... Do some blocking I/O operation ... Reacquire the global interpreter lock. Restore the thread state from the local variable. This is so common that a pair of macros exists to simplify it: Py_BEGIN_ALLOW_THREADS ... Do some blocking I/O operation ... Py_END_ALLOW_THREADS The "Py_BEGIN_ALLOW_THREADS" macro opens a new block and declares a hidden local variable; the "Py_END_ALLOW_THREADS" macro closes the block. The block above expands to the following code: PyThreadState *_save; _save = PyEval_SaveThread(); ... Do some blocking I/O operation ... PyEval_RestoreThread(_save); Here is how these functions work: the global interpreter lock is used to protect the pointer to the current thread state. When releasing the lock and saving the thread state, the current thread state pointer must be retrieved before the lock is released (since another thread could immediately acquire the lock and store its own thread state in the global variable). Conversely, when acquiring the lock and restoring the thread state, the lock must be acquired before storing the thread state pointer. Note: Calling system I/O functions is the most common use case for releasing the GIL, but it can also be useful before calling long- running computations which don’t need access to Python objects, such as compression or cryptographic functions operating over memory buffers. For example, the standard "zlib" and "hashlib" modules release the GIL when compressing or hashing data. Non-Python created threads -------------------------- When threads are created using the dedicated Python APIs (such as the "threading" module), a thread state is automatically associated to them and the code showed above is therefore correct. However, when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them. If you need to call Python code from these threads (often this will be part of a callback API provided by the aforementioned third-party library), you must first register these threads with the interpreter by creating a thread state data structure, then acquiring the GIL, and finally storing their thread state pointer, before you can start using the Python/C API. When you are done, you should reset the thread state pointer, release the GIL, and finally free the thread state data structure. The "PyGILState_Ensure()" and "PyGILState_Release()" functions do all of the above automatically. The typical idiom for calling into Python from a C thread is: PyGILState_STATE gstate; gstate = PyGILState_Ensure(); /* Perform Python actions here. */ result = CallSomeFunction(); /* evaluate result or handle exception */ /* Release the thread. No Python API allowed beyond this point. */ PyGILState_Release(gstate); Note that the "PyGILState_*" functions assume there is only one global interpreter (created automatically by "Py_Initialize()"). Python supports the creation of additional interpreters (using "Py_NewInterpreter()"), but mixing multiple interpreters and the "PyGILState_*" API is unsupported. Cautions about fork() --------------------- Another important thing to note about threads is their behaviour in the face of the C "fork()" call. On most systems with "fork()", after a process forks only the thread that issued the fork will exist. This has a concrete impact both on how locks must be handled and on all stored state in CPython’s runtime. The fact that only the “current” thread remains means any locks held by other threads will never be released. Python solves this for "os.fork()" by acquiring the locks it uses internally before the fork, and releasing them afterwards. In addition, it resets any Lock objects in the child. When extending or embedding Python, there is no way to inform Python of additional (non-Python) locks that need to be acquired before or reset after a fork. OS facilities such as "pthread_atfork()" would need to be used to accomplish the same thing. Additionally, when extending or embedding Python, calling "fork()" directly rather than through "os.fork()" (and returning to or calling into Python) may result in a deadlock by one of Python’s internal locks being held by a thread that is defunct after the fork. "PyOS_AfterFork_Child()" tries to reset the necessary locks, but is not always able to. The fact that all other threads go away also means that CPython’s runtime state there must be cleaned up properly, which "os.fork()" does. This means finalizing all other "PyThreadState" objects belonging to the current interpreter and all other "PyInterpreterState" objects. Due to this and the special nature of the “main” interpreter, "fork()" should only be called in that interpreter’s “main” thread, where the CPython global runtime was originally initialized. The only exception is if "exec()" will be called immediately after. High-level API -------------- These are the most commonly used types and functions when writing C extension code, or when embedding the Python interpreter: type PyInterpreterState * Part of the Limited API (as an opaque struct).* This data structure represents the state shared by a number of cooperating threads. Threads belonging to the same interpreter share their module administration and a few other internal items. There are no public members in this structure. Threads belonging to different interpreters initially share nothing, except process state like available memory, open file descriptors and such. The global interpreter lock is also shared by all threads, regardless of to which interpreter they belong. type PyThreadState * Part of the Limited API (as an opaque struct).* This data structure represents the state of a single thread. The only public data member is: PyInterpreterState *interp This thread’s interpreter state. void PyEval_InitThreads() * Part of the Stable ABI.* Deprecated function which does nothing. In Python 3.6 and older, this function created the GIL if it didn’t exist. Changed in version 3.9: The function now does nothing. Changed in version 3.7: This function is now called by "Py_Initialize()", so you don’t have to call it yourself anymore. Changed in version 3.2: This function cannot be called before "Py_Initialize()" anymore. Deprecated since version 3.9. PyThreadState *PyEval_SaveThread() * Part of the Stable ABI.* Release the global interpreter lock (if it has been created) and reset the thread state to "NULL", returning the previous thread state (which is not "NULL"). If the lock has been created, the current thread must have acquired it. void PyEval_RestoreThread(PyThreadState *tstate) * Part of the Stable ABI.* Acquire the global interpreter lock (if it has been created) and set the thread state to *tstate*, which must not be "NULL". If the lock has been created, the current thread must not have acquired it, otherwise deadlock ensues. Note: Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use "Py_IsFinalizing()" or "sys.is_finalizing()" to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination. PyThreadState *PyThreadState_Get() * Part of the Stable ABI.* Return the current thread state. The global interpreter lock must be held. When the current thread state is "NULL", this issues a fatal error (so that the caller needn’t check for "NULL"). See also "PyThreadState_GetUnchecked()". PyThreadState *PyThreadState_GetUnchecked() Similar to "PyThreadState_Get()", but don’t kill the process with a fatal error if it is NULL. The caller is responsible to check if the result is NULL. Added in version 3.13: In Python 3.5 to 3.12, the function was private and known as "_PyThreadState_UncheckedGet()". PyThreadState *PyThreadState_Swap(PyThreadState *tstate) * Part of the Stable ABI.* Swap the current thread state with the thread state given by the argument *tstate*, which may be "NULL". The *GIL* does not need to be held, but will be held upon returning if *tstate* is non-"NULL". The following functions use thread-local storage, and are not compatible with sub-interpreters: PyGILState_STATE PyGILState_Ensure() * Part of the Stable ABI.* Ensure that the current thread is ready to call the Python C API regardless of the current state of Python, or of the global interpreter lock. This may be called as many times as desired by a thread as long as each call is matched with a call to "PyGILState_Release()". In general, other thread-related APIs may be used between "PyGILState_Ensure()" and "PyGILState_Release()" calls as long as the thread state is restored to its previous state before the Release(). For example, normal usage of the "Py_BEGIN_ALLOW_THREADS" and "Py_END_ALLOW_THREADS" macros is acceptable. The return value is an opaque “handle” to the thread state when "PyGILState_Ensure()" was called, and must be passed to "PyGILState_Release()" to ensure Python is left in the same state. Even though recursive calls are allowed, these handles *cannot* be shared - each unique call to "PyGILState_Ensure()" must save the handle for its call to "PyGILState_Release()". When the function returns, the current thread will hold the GIL and be able to call arbitrary Python code. Failure is a fatal error. Note: Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use "Py_IsFinalizing()" or "sys.is_finalizing()" to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination. void PyGILState_Release(PyGILState_STATE) * Part of the Stable ABI.* Release any resources previously acquired. After this call, Python’s state will be the same as it was prior to the corresponding "PyGILState_Ensure()" call (but generally this state will be unknown to the caller, hence the use of the GILState API). Every call to "PyGILState_Ensure()" must be matched by a call to "PyGILState_Release()" on the same thread. PyThreadState *PyGILState_GetThisThreadState() * Part of the Stable ABI.* Get the current thread state for this thread. May return "NULL" if no GILState API has been used on the current thread. Note that the main thread always has such a thread-state, even if no auto-thread- state call has been made on the main thread. This is mainly a helper/diagnostic function. int PyGILState_Check() Return "1" if the current thread is holding the GIL and "0" otherwise. This function can be called from any thread at any time. Only if it has had its Python thread state initialized and currently is holding the GIL will it return "1". This is mainly a helper/diagnostic function. It can be useful for example in callback contexts or memory allocation functions when knowing that the GIL is locked can allow the caller to perform sensitive actions or otherwise behave differently. Added in version 3.4. The following macros are normally used without a trailing semicolon; look for example usage in the Python source distribution. Py_BEGIN_ALLOW_THREADS * Part of the Stable ABI.* This macro expands to "{ PyThreadState *_save; _save = PyEval_SaveThread();". Note that it contains an opening brace; it must be matched with a following "Py_END_ALLOW_THREADS" macro. See above for further discussion of this macro. Py_END_ALLOW_THREADS * Part of the Stable ABI.* This macro expands to "PyEval_RestoreThread(_save); }". Note that it contains a closing brace; it must be matched with an earlier "Py_BEGIN_ALLOW_THREADS" macro. See above for further discussion of this macro. Py_BLOCK_THREADS * Part of the Stable ABI.* This macro expands to "PyEval_RestoreThread(_save);": it is equivalent to "Py_END_ALLOW_THREADS" without the closing brace. Py_UNBLOCK_THREADS * Part of the Stable ABI.* This macro expands to "_save = PyEval_SaveThread();": it is equivalent to "Py_BEGIN_ALLOW_THREADS" without the opening brace and variable declaration. Low-level API ------------- All of the following functions must be called after "Py_Initialize()". Changed in version 3.7: "Py_Initialize()" now initializes the *GIL*. PyInterpreterState *PyInterpreterState_New() * Part of the Stable ABI.* Create a new interpreter state object. The global interpreter lock need not be held, but may be held if it is necessary to serialize calls to this function. Raises an auditing event "cpython.PyInterpreterState_New" with no arguments. void PyInterpreterState_Clear(PyInterpreterState *interp) * Part of the Stable ABI.* Reset all information in an interpreter state object. The global interpreter lock must be held. Raises an auditing event "cpython.PyInterpreterState_Clear" with no arguments. void PyInterpreterState_Delete(PyInterpreterState *interp) * Part of the Stable ABI.* Destroy an interpreter state object. The global interpreter lock need not be held. The interpreter state must have been reset with a previous call to "PyInterpreterState_Clear()". PyThreadState *PyThreadState_New(PyInterpreterState *interp) * Part of the Stable ABI.* Create a new thread state object belonging to the given interpreter object. The global interpreter lock need not be held, but may be held if it is necessary to serialize calls to this function. void PyThreadState_Clear(PyThreadState *tstate) * Part of the Stable ABI.* Reset all information in a thread state object. The global interpreter lock must be held. Changed in version 3.9: This function now calls the "PyThreadState.on_delete" callback. Previously, that happened in "PyThreadState_Delete()". Changed in version 3.13: The "PyThreadState.on_delete" callback was removed. void PyThreadState_Delete(PyThreadState *tstate) * Part of the Stable ABI.* Destroy a thread state object. The global interpreter lock need not be held. The thread state must have been reset with a previous call to "PyThreadState_Clear()". void PyThreadState_DeleteCurrent(void) Destroy the current thread state and release the global interpreter lock. Like "PyThreadState_Delete()", the global interpreter lock must be held. The thread state must have been reset with a previous call to "PyThreadState_Clear()". PyFrameObject *PyThreadState_GetFrame(PyThreadState *tstate) * Part of the Stable ABI since version 3.10.* Get the current frame of the Python thread state *tstate*. Return a *strong reference*. Return "NULL" if no frame is currently executing. See also "PyEval_GetFrame()". *tstate* must not be "NULL". Added in version 3.9. uint64_t PyThreadState_GetID(PyThreadState *tstate) * Part of the Stable ABI since version 3.10.* Get the unique thread state identifier of the Python thread state *tstate*. *tstate* must not be "NULL". Added in version 3.9. PyInterpreterState *PyThreadState_GetInterpreter(PyThreadState *tstate) * Part of the Stable ABI since version 3.10.* Get the interpreter of the Python thread state *tstate*. *tstate* must not be "NULL". Added in version 3.9. void PyThreadState_EnterTracing(PyThreadState *tstate) Suspend tracing and profiling in the Python thread state *tstate*. Resume them using the "PyThreadState_LeaveTracing()" function. Added in version 3.11. void PyThreadState_LeaveTracing(PyThreadState *tstate) Resume tracing and profiling in the Python thread state *tstate* suspended by the "PyThreadState_EnterTracing()" function. See also "PyEval_SetTrace()" and "PyEval_SetProfile()" functions. Added in version 3.11. PyInterpreterState *PyInterpreterState_Get(void) * Part of the Stable ABI since version 3.9.* Get the current interpreter. Issue a fatal error if there no current Python thread state or no current interpreter. It cannot return NULL. The caller must hold the GIL. Added in version 3.9. int64_t PyInterpreterState_GetID(PyInterpreterState *interp) * Part of the Stable ABI since version 3.7.* Return the interpreter’s unique ID. If there was any error in doing so then "-1" is returned and an error is set. The caller must hold the GIL. Added in version 3.7. PyObject *PyInterpreterState_GetDict(PyInterpreterState *interp) * Part of the Stable ABI since version 3.8.* Return a dictionary in which interpreter-specific data may be stored. If this function returns "NULL" then no exception has been raised and the caller should assume no interpreter-specific dict is available. This is not a replacement for "PyModule_GetState()", which extensions should use to store interpreter-specific state information. Added in version 3.8. PyObject *PyUnstable_InterpreterState_GetMainModule(PyInterpreterState *interp) *This is Unstable API. It may change without warning in minor releases.* Return a *strong reference* to the "__main__" module object for the given interpreter. The caller must hold the GIL. Added in version 3.13. typedef PyObject *(*_PyFrameEvalFunction)(PyThreadState *tstate, _PyInterpreterFrame *frame, int throwflag) Type of a frame evaluation function. The *throwflag* parameter is used by the "throw()" method of generators: if non-zero, handle the current exception. Changed in version 3.9: The function now takes a *tstate* parameter. Changed in version 3.11: The *frame* parameter changed from "PyFrameObject*" to "_PyInterpreterFrame*". _PyFrameEvalFunction _PyInterpreterState_GetEvalFrameFunc(PyInterpreterState *interp) Get the frame evaluation function. See the **PEP 523** “Adding a frame evaluation API to CPython”. Added in version 3.9. void _PyInterpreterState_SetEvalFrameFunc(PyInterpreterState *interp, _PyFrameEvalFunction eval_frame) Set the frame evaluation function. See the **PEP 523** “Adding a frame evaluation API to CPython”. Added in version 3.9. PyObject *PyThreadState_GetDict() *Return value: Borrowed reference.** Part of the Stable ABI.* Return a dictionary in which extensions can store thread-specific state information. Each extension should use a unique key to use to store state in the dictionary. It is okay to call this function when no current thread state is available. If this function returns "NULL", no exception has been raised and the caller should assume no current thread state is available. int PyThreadState_SetAsyncExc(unsigned long id, PyObject *exc) * Part of the Stable ABI.* Asynchronously raise an exception in a thread. The *id* argument is the thread id of the target thread; *exc* is the exception object to be raised. This function does not steal any references to *exc*. To prevent naive misuse, you must write your own C extension to call this. Must be called with the GIL held. Returns the number of thread states modified; this is normally one, but will be zero if the thread id isn’t found. If *exc* is "NULL", the pending exception (if any) for the thread is cleared. This raises no exceptions. Changed in version 3.7: The type of the *id* parameter changed from long to unsigned long. void PyEval_AcquireThread(PyThreadState *tstate) * Part of the Stable ABI.* Acquire the global interpreter lock and set the current thread state to *tstate*, which must not be "NULL". The lock must have been created earlier. If this thread already has the lock, deadlock ensues. Note: Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use "Py_IsFinalizing()" or "sys.is_finalizing()" to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination. Changed in version 3.8: Updated to be consistent with "PyEval_RestoreThread()", "Py_END_ALLOW_THREADS()", and "PyGILState_Ensure()", and terminate the current thread if called while the interpreter is finalizing. "PyEval_RestoreThread()" is a higher-level function which is always available (even when threads have not been initialized). void PyEval_ReleaseThread(PyThreadState *tstate) * Part of the Stable ABI.* Reset the current thread state to "NULL" and release the global interpreter lock. The lock must have been created earlier and must be held by the current thread. The *tstate* argument, which must not be "NULL", is only used to check that it represents the current thread state — if it isn’t, a fatal error is reported. "PyEval_SaveThread()" is a higher-level function which is always available (even when threads have not been initialized). Sub-interpreter support ======================= While in most uses, you will only embed a single Python interpreter, there are cases where you need to create several independent interpreters in the same process and perhaps even in the same thread. Sub-interpreters allow you to do that. The “main” interpreter is the first one created when the runtime initializes. It is usually the only Python interpreter in a process. Unlike sub-interpreters, the main interpreter has unique process- global responsibilities like signal handling. It is also responsible for execution during runtime initialization and is usually the active interpreter during runtime finalization. The "PyInterpreterState_Main()" function returns a pointer to its state. You can switch between sub-interpreters using the "PyThreadState_Swap()" function. You can create and destroy them using the following functions: type PyInterpreterConfig Structure containing most parameters to configure a sub- interpreter. Its values are used only in "Py_NewInterpreterFromConfig()" and never modified by the runtime. Added in version 3.12. Structure fields: int use_main_obmalloc If this is "0" then the sub-interpreter will use its own “object” allocator state. Otherwise it will use (share) the main interpreter’s. If this is "0" then "check_multi_interp_extensions" must be "1" (non-zero). If this is "1" then "gil" must not be "PyInterpreterConfig_OWN_GIL". int allow_fork If this is "0" then the runtime will not support forking the process in any thread where the sub-interpreter is currently active. Otherwise fork is unrestricted. Note that the "subprocess" module still works when fork is disallowed. int allow_exec If this is "0" then the runtime will not support replacing the current process via exec (e.g. "os.execv()") in any thread where the sub-interpreter is currently active. Otherwise exec is unrestricted. Note that the "subprocess" module still works when exec is disallowed. int allow_threads If this is "0" then the sub-interpreter’s "threading" module won’t create threads. Otherwise threads are allowed. int allow_daemon_threads If this is "0" then the sub-interpreter’s "threading" module won’t create daemon threads. Otherwise daemon threads are allowed (as long as "allow_threads" is non-zero). int check_multi_interp_extensions If this is "0" then all extension modules may be imported, including legacy (single-phase init) modules, in any thread where the sub-interpreter is currently active. Otherwise only multi-phase init extension modules (see **PEP 489**) may be imported. (Also see "Py_mod_multiple_interpreters".) This must be "1" (non-zero) if "use_main_obmalloc" is "0". int gil This determines the operation of the GIL for the sub- interpreter. It may be one of the following: PyInterpreterConfig_DEFAULT_GIL Use the default selection ("PyInterpreterConfig_SHARED_GIL"). PyInterpreterConfig_SHARED_GIL Use (share) the main interpreter’s GIL. PyInterpreterConfig_OWN_GIL Use the sub-interpreter’s own GIL. If this is "PyInterpreterConfig_OWN_GIL" then "PyInterpreterConfig.use_main_obmalloc" must be "0". PyStatus Py_NewInterpreterFromConfig(PyThreadState **tstate_p, const PyInterpreterConfig *config) Create a new sub-interpreter. This is an (almost) totally separate environment for the execution of Python code. In particular, the new interpreter has separate, independent versions of all imported modules, including the fundamental modules "builtins", "__main__" and "sys". The table of loaded modules ("sys.modules") and the module search path ("sys.path") are also separate. The new environment has no "sys.argv" variable. It has new standard I/O stream file objects "sys.stdin", "sys.stdout" and "sys.stderr" (however these refer to the same underlying file descriptors). The given *config* controls the options with which the interpreter is initialized. Upon success, *tstate_p* will be set to the first thread state created in the new sub-interpreter. This thread state is made in the current thread state. Note that no actual thread is created; see the discussion of thread states below. If creation of the new interpreter is unsuccessful, *tstate_p* is set to "NULL"; no exception is set since the exception state is stored in the current thread state and there may not be a current thread state. Like all other Python/C API functions, the global interpreter lock must be held before calling this function and is still held when it returns. Likewise a current thread state must be set on entry. On success, the returned thread state will be set as current. If the sub-interpreter is created with its own GIL then the GIL of the calling interpreter will be released. When the function returns, the new interpreter’s GIL will be held by the current thread and the previously interpreter’s GIL will remain released here. Added in version 3.12. Sub-interpreters are most effective when isolated from each other, with certain functionality restricted: PyInterpreterConfig config = { .use_main_obmalloc = 0, .allow_fork = 0, .allow_exec = 0, .allow_threads = 1, .allow_daemon_threads = 0, .check_multi_interp_extensions = 1, .gil = PyInterpreterConfig_OWN_GIL, }; PyThreadState *tstate = NULL; PyStatus status = Py_NewInterpreterFromConfig(&tstate, &config); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } Note that the config is used only briefly and does not get modified. During initialization the config’s values are converted into various "PyInterpreterState" values. A read-only copy of the config may be stored internally on the "PyInterpreterState". Extension modules are shared between (sub-)interpreters as follows: * For modules using multi-phase initialization, e.g. "PyModule_FromDefAndSpec()", a separate module object is created and initialized for each interpreter. Only C-level static and global variables are shared between these module objects. * For modules using single-phase initialization, e.g. "PyModule_Create()", the first time a particular extension is imported, it is initialized normally, and a (shallow) copy of its module’s dictionary is squirreled away. When the same extension is imported by another (sub-)interpreter, a new module is initialized and filled with the contents of this copy; the extension’s "init" function is not called. Objects in the module’s dictionary thus end up shared across (sub-)interpreters, which might cause unwanted behavior (see Bugs and caveats below). Note that this is different from what happens when an extension is imported after the interpreter has been completely re- initialized by calling "Py_FinalizeEx()" and "Py_Initialize()"; in that case, the extension’s "initmodule" function *is* called again. As with multi-phase initialization, this means that only C-level static and global variables are shared between these modules. PyThreadState *Py_NewInterpreter(void) * Part of the Stable ABI.* Create a new sub-interpreter. This is essentially just a wrapper around "Py_NewInterpreterFromConfig()" with a config that preserves the existing behavior. The result is an unisolated sub-interpreter that shares the main interpreter’s GIL, allows fork/exec, allows daemon threads, and allows single-phase init modules. void Py_EndInterpreter(PyThreadState *tstate) * Part of the Stable ABI.* Destroy the (sub-)interpreter represented by the given thread state. The given thread state must be the current thread state. See the discussion of thread states below. When the call returns, the current thread state is "NULL". All thread states associated with this interpreter are destroyed. The global interpreter lock used by the target interpreter must be held before calling this function. No GIL is held when it returns. "Py_FinalizeEx()" will destroy all sub-interpreters that haven’t been explicitly destroyed at that point. A Per-Interpreter GIL --------------------- Using "Py_NewInterpreterFromConfig()" you can create a sub-interpreter that is completely isolated from other interpreters, including having its own GIL. The most important benefit of this isolation is that such an interpreter can execute Python code without being blocked by other interpreters or blocking any others. Thus a single Python process can truly take advantage of multiple CPU cores when running Python code. The isolation also encourages a different approach to concurrency than that of just using threads. (See **PEP 554**.) Using an isolated interpreter requires vigilance in preserving that isolation. That especially means not sharing any objects or mutable state without guarantees about thread-safety. Even objects that are otherwise immutable (e.g. "None", "(1, 5)") can’t normally be shared because of the refcount. One simple but less-efficient approach around this is to use a global lock around all use of some state (or object). Alternately, effectively immutable objects (like integers or strings) can be made safe in spite of their refcounts by making them *immortal*. In fact, this has been done for the builtin singletons, small integers, and a number of other builtin objects. If you preserve isolation then you will have access to proper multi- core computing without the complications that come with free- threading. Failure to preserve isolation will expose you to the full consequences of free-threading, including races and hard-to-debug crashes. Aside from that, one of the main challenges of using multiple isolated interpreters is how to communicate between them safely (not break isolation) and efficiently. The runtime and stdlib do not provide any standard approach to this yet. A future stdlib module would help mitigate the effort of preserving isolation and expose effective tools for communicating (and sharing) data between interpreters. Added in version 3.12. Bugs and caveats ---------------- Because sub-interpreters (and the main interpreter) are part of the same process, the insulation between them isn’t perfect — for example, using low-level file operations like "os.close()" they can (accidentally or maliciously) affect each other’s open files. Because of the way extensions are shared between (sub-)interpreters, some extensions may not work properly; this is especially likely when using single-phase initialization or (static) global variables. It is possible to insert objects created in one sub-interpreter into a namespace of another (sub-)interpreter; this should be avoided if possible. Special care should be taken to avoid sharing user-defined functions, methods, instances or classes between sub-interpreters, since import operations executed by such objects may affect the wrong (sub-)interpreter’s dictionary of loaded modules. It is equally important to avoid sharing objects from which the above are reachable. Also note that combining this functionality with "PyGILState_*" APIs is delicate, because these APIs assume a bijection between Python thread states and OS-level threads, an assumption broken by the presence of sub-interpreters. It is highly recommended that you don’t switch sub-interpreters between a pair of matching "PyGILState_Ensure()" and "PyGILState_Release()" calls. Furthermore, extensions (such as "ctypes") using these APIs to allow calling of Python code from non-Python created threads will probably be broken when using sub-interpreters. Asynchronous Notifications ========================== A mechanism is provided to make asynchronous notifications to the main interpreter thread. These notifications take the form of a function pointer and a void pointer argument. int Py_AddPendingCall(int (*func)(void*), void *arg) * Part of the Stable ABI.* Schedule a function to be called from the main interpreter thread. On success, "0" is returned and *func* is queued for being called in the main thread. On failure, "-1" is returned without setting any exception. When successfully queued, *func* will be *eventually* called from the main interpreter thread with the argument *arg*. It will be called asynchronously with respect to normally running Python code, but with both these conditions met: * on a *bytecode* boundary; * with the main thread holding the *global interpreter lock* (*func* can therefore use the full C API). *func* must return "0" on success, or "-1" on failure with an exception set. *func* won’t be interrupted to perform another asynchronous notification recursively, but it can still be interrupted to switch threads if the global interpreter lock is released. This function doesn’t need a current thread state to run, and it doesn’t need the global interpreter lock. To call this function in a subinterpreter, the caller must hold the GIL. Otherwise, the function *func* can be scheduled to be called from the wrong interpreter. Warning: This is a low-level function, only useful for very special cases. There is no guarantee that *func* will be called as quick as possible. If the main thread is busy executing a system call, *func* won’t be called before the system call returns. This function is generally **not** suitable for calling Python code from arbitrary C threads. Instead, use the PyGILState API. Added in version 3.1. Changed in version 3.9: If this function is called in a subinterpreter, the function *func* is now scheduled to be called from the subinterpreter, rather than being called from the main interpreter. Each subinterpreter now has its own list of scheduled calls. Profiling and Tracing ===================== The Python interpreter provides some low-level support for attaching profiling and execution tracing facilities. These are used for profiling, debugging, and coverage analysis tools. This C interface allows the profiling or tracing code to avoid the overhead of calling through Python-level callable objects, making a direct C function call instead. The essential attributes of the facility have not changed; the interface allows trace functions to be installed per-thread, and the basic events reported to the trace function are the same as had been reported to the Python-level trace functions in previous versions. typedef int (*Py_tracefunc)(PyObject *obj, PyFrameObject *frame, int what, PyObject *arg) The type of the trace function registered using "PyEval_SetProfile()" and "PyEval_SetTrace()". The first parameter is the object passed to the registration function as *obj*, *frame* is the frame object to which the event pertains, *what* is one of the constants "PyTrace_CALL", "PyTrace_EXCEPTION", "PyTrace_LINE", "PyTrace_RETURN", "PyTrace_C_CALL", "PyTrace_C_EXCEPTION", "PyTrace_C_RETURN", or "PyTrace_OPCODE", and *arg* depends on the value of *what*: +---------------------------------+------------------------------------------+ | Value of *what* | Meaning of *arg* | |=================================|==========================================| | "PyTrace_CALL" | Always "Py_None". | +---------------------------------+------------------------------------------+ | "PyTrace_EXCEPTION" | Exception information as returned by | | | "sys.exc_info()". | +---------------------------------+------------------------------------------+ | "PyTrace_LINE" | Always "Py_None". | +---------------------------------+------------------------------------------+ | "PyTrace_RETURN" | Value being returned to the caller, or | | | "NULL" if caused by an exception. | +---------------------------------+------------------------------------------+ | "PyTrace_C_CALL" | Function object being called. | +---------------------------------+------------------------------------------+ | "PyTrace_C_EXCEPTION" | Function object being called. | +---------------------------------+------------------------------------------+ | "PyTrace_C_RETURN" | Function object being called. | +---------------------------------+------------------------------------------+ | "PyTrace_OPCODE" | Always "Py_None". | +---------------------------------+------------------------------------------+ int PyTrace_CALL The value of the *what* parameter to a "Py_tracefunc" function when a new call to a function or method is being reported, or a new entry into a generator. Note that the creation of the iterator for a generator function is not reported as there is no control transfer to the Python bytecode in the corresponding frame. int PyTrace_EXCEPTION The value of the *what* parameter to a "Py_tracefunc" function when an exception has been raised. The callback function is called with this value for *what* when after any bytecode is processed after which the exception becomes set within the frame being executed. The effect of this is that as exception propagation causes the Python stack to unwind, the callback is called upon return to each frame as the exception propagates. Only trace functions receives these events; they are not needed by the profiler. int PyTrace_LINE The value passed as the *what* parameter to a "Py_tracefunc" function (but not a profiling function) when a line-number event is being reported. It may be disabled for a frame by setting "f_trace_lines" to *0* on that frame. int PyTrace_RETURN The value for the *what* parameter to "Py_tracefunc" functions when a call is about to return. int PyTrace_C_CALL The value for the *what* parameter to "Py_tracefunc" functions when a C function is about to be called. int PyTrace_C_EXCEPTION The value for the *what* parameter to "Py_tracefunc" functions when a C function has raised an exception. int PyTrace_C_RETURN The value for the *what* parameter to "Py_tracefunc" functions when a C function has returned. int PyTrace_OPCODE The value for the *what* parameter to "Py_tracefunc" functions (but not profiling functions) when a new opcode is about to be executed. This event is not emitted by default: it must be explicitly requested by setting "f_trace_opcodes" to *1* on the frame. void PyEval_SetProfile(Py_tracefunc func, PyObject *obj) Set the profiler function to *func*. The *obj* parameter is passed to the function as its first parameter, and may be any Python object, or "NULL". If the profile function needs to maintain state, using a different value for *obj* for each thread provides a convenient and thread-safe place to store it. The profile function is called for all monitored events except "PyTrace_LINE" "PyTrace_OPCODE" and "PyTrace_EXCEPTION". See also the "sys.setprofile()" function. The caller must hold the *GIL*. void PyEval_SetProfileAllThreads(Py_tracefunc func, PyObject *obj) Like "PyEval_SetProfile()" but sets the profile function in all running threads belonging to the current interpreter instead of the setting it only on the current thread. The caller must hold the *GIL*. As "PyEval_SetProfile()", this function ignores any exceptions raised while setting the profile functions in all threads. Added in version 3.12. void PyEval_SetTrace(Py_tracefunc func, PyObject *obj) Set the tracing function to *func*. This is similar to "PyEval_SetProfile()", except the tracing function does receive line-number events and per-opcode events, but does not receive any event related to C function objects being called. Any trace function registered using "PyEval_SetTrace()" will not receive "PyTrace_C_CALL", "PyTrace_C_EXCEPTION" or "PyTrace_C_RETURN" as a value for the *what* parameter. See also the "sys.settrace()" function. The caller must hold the *GIL*. void PyEval_SetTraceAllThreads(Py_tracefunc func, PyObject *obj) Like "PyEval_SetTrace()" but sets the tracing function in all running threads belonging to the current interpreter instead of the setting it only on the current thread. The caller must hold the *GIL*. As "PyEval_SetTrace()", this function ignores any exceptions raised while setting the trace functions in all threads. Added in version 3.12. Reference tracing ================= Added in version 3.13. typedef int (*PyRefTracer)(PyObject*, int event, void *data) The type of the trace function registered using "PyRefTracer_SetTracer()". The first parameter is a Python object that has been just created (when **event** is set to "PyRefTracer_CREATE") or about to be destroyed (when **event** is set to "PyRefTracer_DESTROY"). The **data** argument is the opaque pointer that was provided when "PyRefTracer_SetTracer()" was called. Added in version 3.13. int PyRefTracer_CREATE The value for the *event* parameter to "PyRefTracer" functions when a Python object has been created. int PyRefTracer_DESTROY The value for the *event* parameter to "PyRefTracer" functions when a Python object has been destroyed. int PyRefTracer_SetTracer(PyRefTracer tracer, void *data) Register a reference tracer function. The function will be called when a new Python has been created or when an object is going to be destroyed. If **data** is provided it must be an opaque pointer that will be provided when the tracer function is called. Return "0" on success. Set an exception and return "-1" on error. Not that tracer functions **must not** create Python objects inside or otherwise the call will be re-entrant. The tracer also **must not** clear any existing exception or set an exception. The GIL will be held every time the tracer function is called. The GIL must be held when calling this function. Added in version 3.13. PyRefTracer PyRefTracer_GetTracer(void **data) Get the registered reference tracer function and the value of the opaque data pointer that was registered when "PyRefTracer_SetTracer()" was called. If no tracer was registered this function will return NULL and will set the **data** pointer to NULL. The GIL must be held when calling this function. Added in version 3.13. Advanced Debugger Support ========================= These functions are only intended to be used by advanced debugging tools. PyInterpreterState *PyInterpreterState_Head() Return the interpreter state object at the head of the list of all such objects. PyInterpreterState *PyInterpreterState_Main() Return the main interpreter state object. PyInterpreterState *PyInterpreterState_Next(PyInterpreterState *interp) Return the next interpreter state object after *interp* from the list of all such objects. PyThreadState *PyInterpreterState_ThreadHead(PyInterpreterState *interp) Return the pointer to the first "PyThreadState" object in the list of threads associated with the interpreter *interp*. PyThreadState *PyThreadState_Next(PyThreadState *tstate) Return the next thread state object after *tstate* from the list of all such objects belonging to the same "PyInterpreterState" object. Thread Local Storage Support ============================ The Python interpreter provides low-level support for thread-local storage (TLS) which wraps the underlying native TLS implementation to support the Python-level thread local storage API ("threading.local"). The CPython C level APIs are similar to those offered by pthreads and Windows: use a thread key and functions to associate a void* value per thread. The GIL does *not* need to be held when calling these functions; they supply their own locking. Note that "Python.h" does not include the declaration of the TLS APIs, you need to include "pythread.h" to use thread-local storage. Note: None of these API functions handle memory management on behalf of the void* values. You need to allocate and deallocate them yourself. If the void* values happen to be PyObject*, these functions don’t do refcount operations on them either. Thread Specific Storage (TSS) API --------------------------------- TSS API is introduced to supersede the use of the existing TLS API within the CPython interpreter. This API uses a new type "Py_tss_t" instead of int to represent thread keys. Added in version 3.7. See also: “A New C-API for Thread-Local Storage in CPython” (**PEP 539**) type Py_tss_t This data structure represents the state of a thread key, the definition of which may depend on the underlying TLS implementation, and it has an internal field representing the key’s initialization state. There are no public members in this structure. When Py_LIMITED_API is not defined, static allocation of this type by "Py_tss_NEEDS_INIT" is allowed. Py_tss_NEEDS_INIT This macro expands to the initializer for "Py_tss_t" variables. Note that this macro won’t be defined with Py_LIMITED_API. Dynamic Allocation ~~~~~~~~~~~~~~~~~~ Dynamic allocation of the "Py_tss_t", required in extension modules built with Py_LIMITED_API, where static allocation of this type is not possible due to its implementation being opaque at build time. Py_tss_t *PyThread_tss_alloc() * Part of the Stable ABI since version 3.7.* Return a value which is the same state as a value initialized with "Py_tss_NEEDS_INIT", or "NULL" in the case of dynamic allocation failure. void PyThread_tss_free(Py_tss_t *key) * Part of the Stable ABI since version 3.7.* Free the given *key* allocated by "PyThread_tss_alloc()", after first calling "PyThread_tss_delete()" to ensure any associated thread locals have been unassigned. This is a no-op if the *key* argument is "NULL". Note: A freed key becomes a dangling pointer. You should reset the key to "NULL". Methods ~~~~~~~ The parameter *key* of these functions must not be "NULL". Moreover, the behaviors of "PyThread_tss_set()" and "PyThread_tss_get()" are undefined if the given "Py_tss_t" has not been initialized by "PyThread_tss_create()". int PyThread_tss_is_created(Py_tss_t *key) * Part of the Stable ABI since version 3.7.* Return a non-zero value if the given "Py_tss_t" has been initialized by "PyThread_tss_create()". int PyThread_tss_create(Py_tss_t *key) * Part of the Stable ABI since version 3.7.* Return a zero value on successful initialization of a TSS key. The behavior is undefined if the value pointed to by the *key* argument is not initialized by "Py_tss_NEEDS_INIT". This function can be called repeatedly on the same key – calling it on an already initialized key is a no-op and immediately returns success. void PyThread_tss_delete(Py_tss_t *key) * Part of the Stable ABI since version 3.7.* Destroy a TSS key to forget the values associated with the key across all threads, and change the key’s initialization state to uninitialized. A destroyed key is able to be initialized again by "PyThread_tss_create()". This function can be called repeatedly on the same key – calling it on an already destroyed key is a no-op. int PyThread_tss_set(Py_tss_t *key, void *value) * Part of the Stable ABI since version 3.7.* Return a zero value to indicate successfully associating a void* value with a TSS key in the current thread. Each thread has a distinct mapping of the key to a void* value. void *PyThread_tss_get(Py_tss_t *key) * Part of the Stable ABI since version 3.7.* Return the void* value associated with a TSS key in the current thread. This returns "NULL" if no value is associated with the key in the current thread. Thread Local Storage (TLS) API ------------------------------ Deprecated since version 3.7: This API is superseded by Thread Specific Storage (TSS) API. Note: This version of the API does not support platforms where the native TLS key is defined in a way that cannot be safely cast to "int". On such platforms, "PyThread_create_key()" will return immediately with a failure status, and the other TLS functions will all be no-ops on such platforms. Due to the compatibility problem noted above, this version of the API should not be used in new code. int PyThread_create_key() * Part of the Stable ABI.* void PyThread_delete_key(int key) * Part of the Stable ABI.* int PyThread_set_key_value(int key, void *value) * Part of the Stable ABI.* void *PyThread_get_key_value(int key) * Part of the Stable ABI.* void PyThread_delete_key_value(int key) * Part of the Stable ABI.* void PyThread_ReInitTLS() * Part of the Stable ABI.* Synchronization Primitives ========================== The C-API provides a basic mutual exclusion lock. type PyMutex A mutual exclusion lock. The "PyMutex" should be initialized to zero to represent the unlocked state. For example: PyMutex mutex = {0}; Instances of "PyMutex" should not be copied or moved. Both the contents and address of a "PyMutex" are meaningful, and it must remain at a fixed, writable location in memory. Note: A "PyMutex" currently occupies one byte, but the size should be considered unstable. The size may change in future Python releases without a deprecation period. Added in version 3.13. void PyMutex_Lock(PyMutex *m) Lock mutex *m*. If another thread has already locked it, the calling thread will block until the mutex is unlocked. While blocked, the thread will temporarily release the *GIL* if it is held. Added in version 3.13. void PyMutex_Unlock(PyMutex *m) Unlock mutex *m*. The mutex must be locked — otherwise, the function will issue a fatal error. Added in version 3.13. Python Critical Section API --------------------------- The critical section API provides a deadlock avoidance layer on top of per-object locks for *free-threaded* CPython. They are intended to replace reliance on the *global interpreter lock*, and are no-ops in versions of Python with the global interpreter lock. Critical sections avoid deadlocks by implicitly suspending active critical sections and releasing the locks during calls to "PyEval_SaveThread()". When "PyEval_RestoreThread()" is called, the most recent critical section is resumed, and its locks reacquired. This means the critical section API provides weaker guarantees than traditional locks – they are useful because their behavior is similar to the *GIL*. The functions and structs used by the macros are exposed for cases where C macros are not available. They should only be used as in the given macro expansions. Note that the sizes and contents of the structures may change in future Python versions. Note: Operations that need to lock two objects at once must use "Py_BEGIN_CRITICAL_SECTION2". You *cannot* use nested critical sections to lock more than one object at once, because the inner critical section may suspend the outer critical sections. This API does not provide a way to lock more than two objects at once. Example usage: static PyObject * set_field(MyObject *self, PyObject *value) { Py_BEGIN_CRITICAL_SECTION(self); Py_SETREF(self->field, Py_XNewRef(value)); Py_END_CRITICAL_SECTION(); Py_RETURN_NONE; } In the above example, "Py_SETREF" calls "Py_DECREF", which can call arbitrary code through an object’s deallocation function. The critical section API avoids potential deadlocks due to reentrancy and lock ordering by allowing the runtime to temporarily suspend the critical section if the code triggered by the finalizer blocks and calls "PyEval_SaveThread()". Py_BEGIN_CRITICAL_SECTION(op) Acquires the per-object lock for the object *op* and begins a critical section. In the free-threaded build, this macro expands to: { PyCriticalSection _py_cs; PyCriticalSection_Begin(&_py_cs, (PyObject*)(op)) In the default build, this macro expands to "{". Added in version 3.13. Py_END_CRITICAL_SECTION() Ends the critical section and releases the per-object lock. In the free-threaded build, this macro expands to: PyCriticalSection_End(&_py_cs); } In the default build, this macro expands to "}". Added in version 3.13. Py_BEGIN_CRITICAL_SECTION2(a, b) Acquires the per-objects locks for the objects *a* and *b* and begins a critical section. The locks are acquired in a consistent order (lowest address first) to avoid lock ordering deadlocks. In the free-threaded build, this macro expands to: { PyCriticalSection2 _py_cs2; PyCriticalSection2_Begin(&_py_cs2, (PyObject*)(a), (PyObject*)(b)) In the default build, this macro expands to "{". Added in version 3.13. Py_END_CRITICAL_SECTION2() Ends the critical section and releases the per-object locks. In the free-threaded build, this macro expands to: PyCriticalSection2_End(&_py_cs2); } In the default build, this macro expands to "}". Added in version 3.13. Python Initialization Configuration *********************************** Added in version 3.8. Python can be initialized with "Py_InitializeFromConfig()" and the "PyConfig" structure. It can be preinitialized with "Py_PreInitialize()" and the "PyPreConfig" structure. There are two kinds of configuration: * The Python Configuration can be used to build a customized Python which behaves as the regular Python. For example, environment variables and command line arguments are used to configure Python. * The Isolated Configuration can be used to embed Python into an application. It isolates Python from the system. For example, environment variables are ignored, the LC_CTYPE locale is left unchanged and no signal handler is registered. The "Py_RunMain()" function can be used to write a customized Python program. See also Initialization, Finalization, and Threads. See also: **PEP 587** “Python Initialization Configuration”. Example ======= Example of customized Python always running in isolated mode: int main(int argc, char **argv) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); config.isolated = 1; /* Decode command line arguments. Implicitly preinitialize Python (in isolated mode). */ status = PyConfig_SetBytesArgv(&config, argc, argv); if (PyStatus_Exception(status)) { goto exception; } status = Py_InitializeFromConfig(&config); if (PyStatus_Exception(status)) { goto exception; } PyConfig_Clear(&config); return Py_RunMain(); exception: PyConfig_Clear(&config); if (PyStatus_IsExit(status)) { return status.exitcode; } /* Display the error message and exit the process with non-zero exit code */ Py_ExitStatusException(status); } PyWideStringList ================ type PyWideStringList List of "wchar_t*" strings. If *length* is non-zero, *items* must be non-"NULL" and all strings must be non-"NULL". Methods: PyStatus PyWideStringList_Append(PyWideStringList *list, const wchar_t *item) Append *item* to *list*. Python must be preinitialized to call this function. PyStatus PyWideStringList_Insert(PyWideStringList *list, Py_ssize_t index, const wchar_t *item) Insert *item* into *list* at *index*. If *index* is greater than or equal to *list* length, append *item* to *list*. *index* must be greater than or equal to "0". Python must be preinitialized to call this function. Structure fields: Py_ssize_t length List length. wchar_t **items List items. PyStatus ======== type PyStatus Structure to store an initialization function status: success, error or exit. For an error, it can store the C function name which created the error. Structure fields: int exitcode Exit code. Argument passed to "exit()". const char *err_msg Error message. const char *func Name of the function which created an error, can be "NULL". Functions to create a status: PyStatus PyStatus_Ok(void) Success. PyStatus PyStatus_Error(const char *err_msg) Initialization error with a message. *err_msg* must not be "NULL". PyStatus PyStatus_NoMemory(void) Memory allocation failure (out of memory). PyStatus PyStatus_Exit(int exitcode) Exit Python with the specified exit code. Functions to handle a status: int PyStatus_Exception(PyStatus status) Is the status an error or an exit? If true, the exception must be handled; by calling "Py_ExitStatusException()" for example. int PyStatus_IsError(PyStatus status) Is the result an error? int PyStatus_IsExit(PyStatus status) Is the result an exit? void Py_ExitStatusException(PyStatus status) Call "exit(exitcode)" if *status* is an exit. Print the error message and exit with a non-zero exit code if *status* is an error. Must only be called if "PyStatus_Exception(status)" is non-zero. Note: Internally, Python uses macros which set "PyStatus.func", whereas functions to create a status set "func" to "NULL". Example: PyStatus alloc(void **ptr, size_t size) { *ptr = PyMem_RawMalloc(size); if (*ptr == NULL) { return PyStatus_NoMemory(); } return PyStatus_Ok(); } int main(int argc, char **argv) { void *ptr; PyStatus status = alloc(&ptr, 16); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } PyMem_Free(ptr); return 0; } PyPreConfig =========== type PyPreConfig Structure used to preinitialize Python. Function to initialize a preconfiguration: void PyPreConfig_InitPythonConfig(PyPreConfig *preconfig) Initialize the preconfiguration with Python Configuration. void PyPreConfig_InitIsolatedConfig(PyPreConfig *preconfig) Initialize the preconfiguration with Isolated Configuration. Structure fields: int allocator Name of the Python memory allocators: * "PYMEM_ALLOCATOR_NOT_SET" ("0"): don’t change memory allocators (use defaults). * "PYMEM_ALLOCATOR_DEFAULT" ("1"): default memory allocators. * "PYMEM_ALLOCATOR_DEBUG" ("2"): default memory allocators with debug hooks. * "PYMEM_ALLOCATOR_MALLOC" ("3"): use "malloc()" of the C library. * "PYMEM_ALLOCATOR_MALLOC_DEBUG" ("4"): force usage of "malloc()" with debug hooks. * "PYMEM_ALLOCATOR_PYMALLOC" ("5"): Python pymalloc memory allocator. * "PYMEM_ALLOCATOR_PYMALLOC_DEBUG" ("6"): Python pymalloc memory allocator with debug hooks. * "PYMEM_ALLOCATOR_MIMALLOC" ("6"): use "mimalloc", a fast malloc replacement. * "PYMEM_ALLOCATOR_MIMALLOC_DEBUG" ("7"): use "mimalloc", a fast malloc replacement with debug hooks. "PYMEM_ALLOCATOR_PYMALLOC" and "PYMEM_ALLOCATOR_PYMALLOC_DEBUG" are not supported if Python is "configured using --without- pymalloc". "PYMEM_ALLOCATOR_MIMALLOC" and "PYMEM_ALLOCATOR_MIMALLOC_DEBUG" are not supported if Python is "configured using --without- mimalloc" or if the underlying atomic support isn’t available. See Memory Management. Default: "PYMEM_ALLOCATOR_NOT_SET". int configure_locale Set the LC_CTYPE locale to the user preferred locale. If equals to "0", set "coerce_c_locale" and "coerce_c_locale_warn" members to "0". See the *locale encoding*. Default: "1" in Python config, "0" in isolated config. int coerce_c_locale If equals to "2", coerce the C locale. If equals to "1", read the LC_CTYPE locale to decide if it should be coerced. See the *locale encoding*. Default: "-1" in Python config, "0" in isolated config. int coerce_c_locale_warn If non-zero, emit a warning if the C locale is coerced. Default: "-1" in Python config, "0" in isolated config. int dev_mode Python Development Mode: see "PyConfig.dev_mode". Default: "-1" in Python mode, "0" in isolated mode. int isolated Isolated mode: see "PyConfig.isolated". Default: "0" in Python mode, "1" in isolated mode. int legacy_windows_fs_encoding If non-zero: * Set "PyPreConfig.utf8_mode" to "0", * Set "PyConfig.filesystem_encoding" to ""mbcs"", * Set "PyConfig.filesystem_errors" to ""replace"". Initialized from the "PYTHONLEGACYWINDOWSFSENCODING" environment variable value. Only available on Windows. "#ifdef MS_WINDOWS" macro can be used for Windows specific code. Default: "0". int parse_argv If non-zero, "Py_PreInitializeFromArgs()" and "Py_PreInitializeFromBytesArgs()" parse their "argv" argument the same way the regular Python parses command line arguments: see Command Line Arguments. Default: "1" in Python config, "0" in isolated config. int use_environment Use environment variables? See "PyConfig.use_environment". Default: "1" in Python config and "0" in isolated config. int utf8_mode If non-zero, enable the Python UTF-8 Mode. Set to "0" or "1" by the "-X utf8" command line option and the "PYTHONUTF8" environment variable. Also set to "1" if the "LC_CTYPE" locale is "C" or "POSIX". Default: "-1" in Python config and "0" in isolated config. Preinitialize Python with PyPreConfig ===================================== The preinitialization of Python: * Set the Python memory allocators ("PyPreConfig.allocator") * Configure the LC_CTYPE locale (*locale encoding*) * Set the Python UTF-8 Mode ("PyPreConfig.utf8_mode") The current preconfiguration ("PyPreConfig" type) is stored in "_PyRuntime.preconfig". Functions to preinitialize Python: PyStatus Py_PreInitialize(const PyPreConfig *preconfig) Preinitialize Python from *preconfig* preconfiguration. *preconfig* must not be "NULL". PyStatus Py_PreInitializeFromBytesArgs(const PyPreConfig *preconfig, int argc, char *const *argv) Preinitialize Python from *preconfig* preconfiguration. Parse *argv* command line arguments (bytes strings) if "parse_argv" of *preconfig* is non-zero. *preconfig* must not be "NULL". PyStatus Py_PreInitializeFromArgs(const PyPreConfig *preconfig, int argc, wchar_t *const *argv) Preinitialize Python from *preconfig* preconfiguration. Parse *argv* command line arguments (wide strings) if "parse_argv" of *preconfig* is non-zero. *preconfig* must not be "NULL". The caller is responsible to handle exceptions (error or exit) using "PyStatus_Exception()" and "Py_ExitStatusException()". For Python Configuration ("PyPreConfig_InitPythonConfig()"), if Python is initialized with command line arguments, the command line arguments must also be passed to preinitialize Python, since they have an effect on the pre-configuration like encodings. For example, the "-X utf8" command line option enables the Python UTF-8 Mode. "PyMem_SetAllocator()" can be called after "Py_PreInitialize()" and before "Py_InitializeFromConfig()" to install a custom memory allocator. It can be called before "Py_PreInitialize()" if "PyPreConfig.allocator" is set to "PYMEM_ALLOCATOR_NOT_SET". Python memory allocation functions like "PyMem_RawMalloc()" must not be used before the Python preinitialization, whereas calling directly "malloc()" and "free()" is always safe. "Py_DecodeLocale()" must not be called before the Python preinitialization. Example using the preinitialization to enable the Python UTF-8 Mode: PyStatus status; PyPreConfig preconfig; PyPreConfig_InitPythonConfig(&preconfig); preconfig.utf8_mode = 1; status = Py_PreInitialize(&preconfig); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } /* at this point, Python speaks UTF-8 */ Py_Initialize(); /* ... use Python API here ... */ Py_Finalize(); PyConfig ======== type PyConfig Structure containing most parameters to configure Python. When done, the "PyConfig_Clear()" function must be used to release the configuration memory. Structure methods: void PyConfig_InitPythonConfig(PyConfig *config) Initialize configuration with the Python Configuration. void PyConfig_InitIsolatedConfig(PyConfig *config) Initialize configuration with the Isolated Configuration. PyStatus PyConfig_SetString(PyConfig *config, wchar_t *const *config_str, const wchar_t *str) Copy the wide character string *str* into "*config_str". Preinitialize Python if needed. PyStatus PyConfig_SetBytesString(PyConfig *config, wchar_t *const *config_str, const char *str) Decode *str* using "Py_DecodeLocale()" and set the result into "*config_str". Preinitialize Python if needed. PyStatus PyConfig_SetArgv(PyConfig *config, int argc, wchar_t *const *argv) Set command line arguments ("argv" member of *config*) from the *argv* list of wide character strings. Preinitialize Python if needed. PyStatus PyConfig_SetBytesArgv(PyConfig *config, int argc, char *const *argv) Set command line arguments ("argv" member of *config*) from the *argv* list of bytes strings. Decode bytes using "Py_DecodeLocale()". Preinitialize Python if needed. PyStatus PyConfig_SetWideStringList(PyConfig *config, PyWideStringList *list, Py_ssize_t length, wchar_t **items) Set the list of wide strings *list* to *length* and *items*. Preinitialize Python if needed. PyStatus PyConfig_Read(PyConfig *config) Read all Python configuration. Fields which are already initialized are left unchanged. Fields for path configuration are no longer calculated or modified when calling this function, as of Python 3.11. The "PyConfig_Read()" function only parses "PyConfig.argv" arguments once: "PyConfig.parse_argv" is set to "2" after arguments are parsed. Since Python arguments are stripped from "PyConfig.argv", parsing arguments twice would parse the application options as Python options. Preinitialize Python if needed. Changed in version 3.10: The "PyConfig.argv" arguments are now only parsed once, "PyConfig.parse_argv" is set to "2" after arguments are parsed, and arguments are only parsed if "PyConfig.parse_argv" equals "1". Changed in version 3.11: "PyConfig_Read()" no longer calculates all paths, and so fields listed under Python Path Configuration may no longer be updated until "Py_InitializeFromConfig()" is called. void PyConfig_Clear(PyConfig *config) Release configuration memory. Most "PyConfig" methods preinitialize Python if needed. In that case, the Python preinitialization configuration ("PyPreConfig") in based on the "PyConfig". If configuration fields which are in common with "PyPreConfig" are tuned, they must be set before calling a "PyConfig" method: * "PyConfig.dev_mode" * "PyConfig.isolated" * "PyConfig.parse_argv" * "PyConfig.use_environment" Moreover, if "PyConfig_SetArgv()" or "PyConfig_SetBytesArgv()" is used, this method must be called before other methods, since the preinitialization configuration depends on command line arguments (if "parse_argv" is non-zero). The caller of these methods is responsible to handle exceptions (error or exit) using "PyStatus_Exception()" and "Py_ExitStatusException()". Structure fields: PyWideStringList argv Set "sys.argv" command line arguments based on "argv". These parameters are similar to those passed to the program’s "main()" function with the difference that the first entry should refer to the script file to be executed rather than the executable hosting the Python interpreter. If there isn’t a script that will be run, the first entry in "argv" can be an empty string. Set "parse_argv" to "1" to parse "argv" the same way the regular Python parses Python command line arguments and then to strip Python arguments from "argv". If "argv" is empty, an empty string is added to ensure that "sys.argv" always exists and is never empty. Default: "NULL". See also the "orig_argv" member. int safe_path If equals to zero, "Py_RunMain()" prepends a potentially unsafe path to "sys.path" at startup: * If "argv[0]" is equal to "L"-m"" ("python -m module"), prepend the current working directory. * If running a script ("python script.py"), prepend the script’s directory. If it’s a symbolic link, resolve symbolic links. * Otherwise ("python -c code" and "python"), prepend an empty string, which means the current working directory. Set to "1" by the "-P" command line option and the "PYTHONSAFEPATH" environment variable. Default: "0" in Python config, "1" in isolated config. Added in version 3.11. wchar_t *base_exec_prefix "sys.base_exec_prefix". Default: "NULL". Part of the Python Path Configuration output. See also "PyConfig.exec_prefix". wchar_t *base_executable Python base executable: "sys._base_executable". Set by the "__PYVENV_LAUNCHER__" environment variable. Set from "PyConfig.executable" if "NULL". Default: "NULL". Part of the Python Path Configuration output. See also "PyConfig.executable". wchar_t *base_prefix "sys.base_prefix". Default: "NULL". Part of the Python Path Configuration output. See also "PyConfig.prefix". int buffered_stdio If equals to "0" and "configure_c_stdio" is non-zero, disable buffering on the C streams stdout and stderr. Set to "0" by the "-u" command line option and the "PYTHONUNBUFFERED" environment variable. stdin is always opened in buffered mode. Default: "1". int bytes_warning If equals to "1", issue a warning when comparing "bytes" or "bytearray" with "str", or comparing "bytes" with "int". If equal or greater to "2", raise a "BytesWarning" exception in these cases. Incremented by the "-b" command line option. Default: "0". int warn_default_encoding If non-zero, emit a "EncodingWarning" warning when "io.TextIOWrapper" uses its default encoding. See Opt-in EncodingWarning for details. Default: "0". Added in version 3.10. int code_debug_ranges If equals to "0", disables the inclusion of the end line and column mappings in code objects. Also disables traceback printing carets to specific error locations. Set to "0" by the "PYTHONNODEBUGRANGES" environment variable and by the "-X no_debug_ranges" command line option. Default: "1". Added in version 3.11. wchar_t *check_hash_pycs_mode Control the validation behavior of hash-based ".pyc" files: value of the "--check-hash-based-pycs" command line option. Valid values: * "L"always"": Hash the source file for invalidation regardless of value of the ‘check_source’ flag. * "L"never"": Assume that hash-based pycs always are valid. * "L"default"": The ‘check_source’ flag in hash-based pycs determines invalidation. Default: "L"default"". See also **PEP 552** “Deterministic pycs”. int configure_c_stdio If non-zero, configure C standard streams: * On Windows, set the binary mode ("O_BINARY") on stdin, stdout and stderr. * If "buffered_stdio" equals zero, disable buffering of stdin, stdout and stderr streams. * If "interactive" is non-zero, enable stream buffering on stdin and stdout (only stdout on Windows). Default: "1" in Python config, "0" in isolated config. int dev_mode If non-zero, enable the Python Development Mode. Set to "1" by the "-X dev" option and the "PYTHONDEVMODE" environment variable. Default: "-1" in Python mode, "0" in isolated mode. int dump_refs Dump Python references? If non-zero, dump all objects which are still alive at exit. Set to "1" by the "PYTHONDUMPREFS" environment variable. Needs a special build of Python with the "Py_TRACE_REFS" macro defined: see the "configure --with-trace-refs option". Default: "0". wchar_t *exec_prefix The site-specific directory prefix where the platform-dependent Python files are installed: "sys.exec_prefix". Default: "NULL". Part of the Python Path Configuration output. See also "PyConfig.base_exec_prefix". wchar_t *executable The absolute path of the executable binary for the Python interpreter: "sys.executable". Default: "NULL". Part of the Python Path Configuration output. See also "PyConfig.base_executable". int faulthandler Enable faulthandler? If non-zero, call "faulthandler.enable()" at startup. Set to "1" by "-X faulthandler" and the "PYTHONFAULTHANDLER" environment variable. Default: "-1" in Python mode, "0" in isolated mode. wchar_t *filesystem_encoding *Filesystem encoding*: "sys.getfilesystemencoding()". On macOS, Android and VxWorks: use ""utf-8"" by default. On Windows: use ""utf-8"" by default, or ""mbcs"" if "legacy_windows_fs_encoding" of "PyPreConfig" is non-zero. Default encoding on other platforms: * ""utf-8"" if "PyPreConfig.utf8_mode" is non-zero. * ""ascii"" if Python detects that "nl_langinfo(CODESET)" announces the ASCII encoding, whereas the "mbstowcs()" function decodes from a different encoding (usually Latin1). * ""utf-8"" if "nl_langinfo(CODESET)" returns an empty string. * Otherwise, use the *locale encoding*: "nl_langinfo(CODESET)" result. At Python startup, the encoding name is normalized to the Python codec name. For example, ""ANSI_X3.4-1968"" is replaced with ""ascii"". See also the "filesystem_errors" member. wchar_t *filesystem_errors *Filesystem error handler*: "sys.getfilesystemencodeerrors()". On Windows: use ""surrogatepass"" by default, or ""replace"" if "legacy_windows_fs_encoding" of "PyPreConfig" is non-zero. On other platforms: use ""surrogateescape"" by default. Supported error handlers: * ""strict"" * ""surrogateescape"" * ""surrogatepass"" (only supported with the UTF-8 encoding) See also the "filesystem_encoding" member. unsigned long hash_seed int use_hash_seed Randomized hash function seed. If "use_hash_seed" is zero, a seed is chosen randomly at Python startup, and "hash_seed" is ignored. Set by the "PYTHONHASHSEED" environment variable. Default *use_hash_seed* value: "-1" in Python mode, "0" in isolated mode. wchar_t *home Set the default Python “home” directory, that is, the location of the standard Python libraries (see "PYTHONHOME"). Set by the "PYTHONHOME" environment variable. Default: "NULL". Part of the Python Path Configuration input. int import_time If non-zero, profile import time. Set the "1" by the "-X importtime" option and the "PYTHONPROFILEIMPORTTIME" environment variable. Default: "0". int inspect Enter interactive mode after executing a script or a command. If greater than "0", enable inspect: when a script is passed as first argument or the -c option is used, enter interactive mode after executing the script or the command, even when "sys.stdin" does not appear to be a terminal. Incremented by the "-i" command line option. Set to "1" if the "PYTHONINSPECT" environment variable is non-empty. Default: "0". int install_signal_handlers Install Python signal handlers? Default: "1" in Python mode, "0" in isolated mode. int interactive If greater than "0", enable the interactive mode (REPL). Incremented by the "-i" command line option. Default: "0". int int_max_str_digits Configures the integer string conversion length limitation. An initial value of "-1" means the value will be taken from the command line or environment or otherwise default to 4300 ("sys.int_info.default_max_str_digits"). A value of "0" disables the limitation. Values greater than zero but less than 640 ("sys.int_info.str_digits_check_threshold") are unsupported and will produce an error. Configured by the "-X int_max_str_digits" command line flag or the "PYTHONINTMAXSTRDIGITS" environment variable. Default: "-1" in Python mode. 4300 ("sys.int_info.default_max_str_digits") in isolated mode. Added in version 3.12. int cpu_count If the value of "cpu_count" is not "-1" then it will override the return values of "os.cpu_count()", "os.process_cpu_count()", and "multiprocessing.cpu_count()". Configured by the "-X cpu_count=*n|default*" command line flag or the "PYTHON_CPU_COUNT" environment variable. Default: "-1". Added in version 3.13. int isolated If greater than "0", enable isolated mode: * Set "safe_path" to "1": don’t prepend a potentially unsafe path to "sys.path" at Python startup, such as the current directory, the script’s directory or an empty string. * Set "use_environment" to "0": ignore "PYTHON" environment variables. * Set "user_site_directory" to "0": don’t add the user site directory to "sys.path". * Python REPL doesn’t import "readline" nor enable default readline configuration on interactive prompts. Set to "1" by the "-I" command line option. Default: "0" in Python mode, "1" in isolated mode. See also the Isolated Configuration and "PyPreConfig.isolated". int legacy_windows_stdio If non-zero, use "io.FileIO" instead of "io._WindowsConsoleIO" for "sys.stdin", "sys.stdout" and "sys.stderr". Set to "1" if the "PYTHONLEGACYWINDOWSSTDIO" environment variable is set to a non-empty string. Only available on Windows. "#ifdef MS_WINDOWS" macro can be used for Windows specific code. Default: "0". See also the **PEP 528** (Change Windows console encoding to UTF-8). int malloc_stats If non-zero, dump statistics on Python pymalloc memory allocator at exit. Set to "1" by the "PYTHONMALLOCSTATS" environment variable. The option is ignored if Python is "configured using the --without-pymalloc option". Default: "0". wchar_t *platlibdir Platform library directory name: "sys.platlibdir". Set by the "PYTHONPLATLIBDIR" environment variable. Default: value of the "PLATLIBDIR" macro which is set by the "configure --with-platlibdir option" (default: ""lib"", or ""DLLs"" on Windows). Part of the Python Path Configuration input. Added in version 3.9. Changed in version 3.11: This macro is now used on Windows to locate the standard library extension modules, typically under "DLLs". However, for compatibility, note that this value is ignored for any non-standard layouts, including in-tree builds and virtual environments. wchar_t *pythonpath_env Module search paths ("sys.path") as a string separated by "DELIM" ("os.pathsep"). Set by the "PYTHONPATH" environment variable. Default: "NULL". Part of the Python Path Configuration input. PyWideStringList module_search_paths int module_search_paths_set Module search paths: "sys.path". If "module_search_paths_set" is equal to "0", "Py_InitializeFromConfig()" will replace "module_search_paths" and sets "module_search_paths_set" to "1". Default: empty list ("module_search_paths") and "0" ("module_search_paths_set"). Part of the Python Path Configuration output. int optimization_level Compilation optimization level: * "0": Peephole optimizer, set "__debug__" to "True". * "1": Level 0, remove assertions, set "__debug__" to "False". * "2": Level 1, strip docstrings. Incremented by the "-O" command line option. Set to the "PYTHONOPTIMIZE" environment variable value. Default: "0". PyWideStringList orig_argv The list of the original command line arguments passed to the Python executable: "sys.orig_argv". If "orig_argv" list is empty and "argv" is not a list only containing an empty string, "PyConfig_Read()" copies "argv" into "orig_argv" before modifying "argv" (if "parse_argv" is non- zero). See also the "argv" member and the "Py_GetArgcArgv()" function. Default: empty list. Added in version 3.10. int parse_argv Parse command line arguments? If equals to "1", parse "argv" the same way the regular Python parses command line arguments, and strip Python arguments from "argv". The "PyConfig_Read()" function only parses "PyConfig.argv" arguments once: "PyConfig.parse_argv" is set to "2" after arguments are parsed. Since Python arguments are stripped from "PyConfig.argv", parsing arguments twice would parse the application options as Python options. Default: "1" in Python mode, "0" in isolated mode. Changed in version 3.10: The "PyConfig.argv" arguments are now only parsed if "PyConfig.parse_argv" equals to "1". int parser_debug Parser debug mode. If greater than "0", turn on parser debugging output (for expert only, depending on compilation options). Incremented by the "-d" command line option. Set to the "PYTHONDEBUG" environment variable value. Needs a debug build of Python (the "Py_DEBUG" macro must be defined). Default: "0". int pathconfig_warnings If non-zero, calculation of path configuration is allowed to log warnings into "stderr". If equals to "0", suppress these warnings. Default: "1" in Python mode, "0" in isolated mode. Part of the Python Path Configuration input. Changed in version 3.11: Now also applies on Windows. wchar_t *prefix The site-specific directory prefix where the platform independent Python files are installed: "sys.prefix". Default: "NULL". Part of the Python Path Configuration output. See also "PyConfig.base_prefix". wchar_t *program_name Program name used to initialize "executable" and in early error messages during Python initialization. * On macOS, use "PYTHONEXECUTABLE" environment variable if set. * If the "WITH_NEXT_FRAMEWORK" macro is defined, use "__PYVENV_LAUNCHER__" environment variable if set. * Use "argv[0]" of "argv" if available and non-empty. * Otherwise, use "L"python"" on Windows, or "L"python3"" on other platforms. Default: "NULL". Part of the Python Path Configuration input. wchar_t *pycache_prefix Directory where cached ".pyc" files are written: "sys.pycache_prefix". Set by the "-X pycache_prefix=PATH" command line option and the "PYTHONPYCACHEPREFIX" environment variable. The command-line option takes precedence. If "NULL", "sys.pycache_prefix" is set to "None". Default: "NULL". int quiet Quiet mode. If greater than "0", don’t display the copyright and version at Python startup in interactive mode. Incremented by the "-q" command line option. Default: "0". wchar_t *run_command Value of the "-c" command line option. Used by "Py_RunMain()". Default: "NULL". wchar_t *run_filename Filename passed on the command line: trailing command line argument without "-c" or "-m". It is used by the "Py_RunMain()" function. For example, it is set to "script.py" by the "python3 script.py arg" command line. See also the "PyConfig.skip_source_first_line" option. Default: "NULL". wchar_t *run_module Value of the "-m" command line option. Used by "Py_RunMain()". Default: "NULL". wchar_t *run_presite "package.module" path to module that should be imported before "site.py" is run. Set by the "-X presite=package.module" command-line option and the "PYTHON_PRESITE" environment variable. The command-line option takes precedence. Needs a debug build of Python (the "Py_DEBUG" macro must be defined). Default: "NULL". int show_ref_count Show total reference count at exit (excluding *immortal* objects)? Set to "1" by "-X showrefcount" command line option. Needs a debug build of Python (the "Py_REF_DEBUG" macro must be defined). Default: "0". int site_import Import the "site" module at startup? If equal to zero, disable the import of the module site and the site-dependent manipulations of "sys.path" that it entails. Also disable these manipulations if the "site" module is explicitly imported later (call "site.main()" if you want them to be triggered). Set to "0" by the "-S" command line option. "sys.flags.no_site" is set to the inverted value of "site_import". Default: "1". int skip_source_first_line If non-zero, skip the first line of the "PyConfig.run_filename" source. It allows the usage of non-Unix forms of "#!cmd". This is intended for a DOS specific hack only. Set to "1" by the "-x" command line option. Default: "0". wchar_t *stdio_encoding wchar_t *stdio_errors Encoding and encoding errors of "sys.stdin", "sys.stdout" and "sys.stderr" (but "sys.stderr" always uses ""backslashreplace"" error handler). Use the "PYTHONIOENCODING" environment variable if it is non- empty. Default encoding: * ""UTF-8"" if "PyPreConfig.utf8_mode" is non-zero. * Otherwise, use the *locale encoding*. Default error handler: * On Windows: use ""surrogateescape"". * ""surrogateescape"" if "PyPreConfig.utf8_mode" is non-zero, or if the LC_CTYPE locale is “C” or “POSIX”. * ""strict"" otherwise. See also "PyConfig.legacy_windows_stdio". int tracemalloc Enable tracemalloc? If non-zero, call "tracemalloc.start()" at startup. Set by "-X tracemalloc=N" command line option and by the "PYTHONTRACEMALLOC" environment variable. Default: "-1" in Python mode, "0" in isolated mode. int perf_profiling Enable compatibility mode with the perf profiler? If non-zero, initialize the perf trampoline. See Python support for the Linux perf profiler for more information. Set by "-X perf" command-line option and by the "PYTHON_PERF_JIT_SUPPORT" environment variable for perf support with stack pointers and "-X perf_jit" command-line option and by the "PYTHON_PERF_JIT_SUPPORT" environment variable for perf support with DWARF JIT information. Default: "-1". Added in version 3.12. int use_environment Use environment variables? If equals to zero, ignore the environment variables. Set to "0" by the "-E" environment variable. Default: "1" in Python config and "0" in isolated config. int user_site_directory If non-zero, add the user site directory to "sys.path". Set to "0" by the "-s" and "-I" command line options. Set to "0" by the "PYTHONNOUSERSITE" environment variable. Default: "1" in Python mode, "0" in isolated mode. int verbose Verbose mode. If greater than "0", print a message each time a module is imported, showing the place (filename or built-in module) from which it is loaded. If greater than or equal to "2", print a message for each file that is checked for when searching for a module. Also provides information on module cleanup at exit. Incremented by the "-v" command line option. Set by the "PYTHONVERBOSE" environment variable value. Default: "0". PyWideStringList warnoptions Options of the "warnings" module to build warnings filters, lowest to highest priority: "sys.warnoptions". The "warnings" module adds "sys.warnoptions" in the reverse order: the last "PyConfig.warnoptions" item becomes the first item of "warnings.filters" which is checked first (highest priority). The "-W" command line options adds its value to "warnoptions", it can be used multiple times. The "PYTHONWARNINGS" environment variable can also be used to add warning options. Multiple options can be specified, separated by commas (","). Default: empty list. int write_bytecode If equal to "0", Python won’t try to write ".pyc" files on the import of source modules. Set to "0" by the "-B" command line option and the "PYTHONDONTWRITEBYTECODE" environment variable. "sys.dont_write_bytecode" is initialized to the inverted value of "write_bytecode". Default: "1". PyWideStringList xoptions Values of the "-X" command line options: "sys._xoptions". Default: empty list. If "parse_argv" is non-zero, "argv" arguments are parsed the same way the regular Python parses command line arguments, and Python arguments are stripped from "argv". The "xoptions" options are parsed to set other options: see the "-X" command line option. Changed in version 3.9: The "show_alloc_count" field has been removed. Initialization with PyConfig ============================ Initializing the interpreter from a populated configuration struct is handled by calling "Py_InitializeFromConfig()". The caller is responsible to handle exceptions (error or exit) using "PyStatus_Exception()" and "Py_ExitStatusException()". If "PyImport_FrozenModules()", "PyImport_AppendInittab()" or "PyImport_ExtendInittab()" are used, they must be set or called after Python preinitialization and before the Python initialization. If Python is initialized multiple times, "PyImport_AppendInittab()" or "PyImport_ExtendInittab()" must be called before each Python initialization. The current configuration ("PyConfig" type) is stored in "PyInterpreterState.config". Example setting the program name: void init_python(void) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); /* Set the program name. Implicitly preinitialize Python. */ status = PyConfig_SetString(&config, &config.program_name, L"/path/to/my_program"); if (PyStatus_Exception(status)) { goto exception; } status = Py_InitializeFromConfig(&config); if (PyStatus_Exception(status)) { goto exception; } PyConfig_Clear(&config); return; exception: PyConfig_Clear(&config); Py_ExitStatusException(status); } More complete example modifying the default configuration, read the configuration, and then override some parameters. Note that since 3.11, many parameters are not calculated until initialization, and so values cannot be read from the configuration structure. Any values set before initialize is called will be left unchanged by initialization: PyStatus init_python(const char *program_name) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); /* Set the program name before reading the configuration (decode byte string from the locale encoding). Implicitly preinitialize Python. */ status = PyConfig_SetBytesString(&config, &config.program_name, program_name); if (PyStatus_Exception(status)) { goto done; } /* Read all configuration at once */ status = PyConfig_Read(&config); if (PyStatus_Exception(status)) { goto done; } /* Specify sys.path explicitly */ /* If you want to modify the default set of paths, finish initialization first and then use PySys_GetObject("path") */ config.module_search_paths_set = 1; status = PyWideStringList_Append(&config.module_search_paths, L"/path/to/stdlib"); if (PyStatus_Exception(status)) { goto done; } status = PyWideStringList_Append(&config.module_search_paths, L"/path/to/more/modules"); if (PyStatus_Exception(status)) { goto done; } /* Override executable computed by PyConfig_Read() */ status = PyConfig_SetString(&config, &config.executable, L"/path/to/my_executable"); if (PyStatus_Exception(status)) { goto done; } status = Py_InitializeFromConfig(&config); done: PyConfig_Clear(&config); return status; } Isolated Configuration ====================== "PyPreConfig_InitIsolatedConfig()" and "PyConfig_InitIsolatedConfig()" functions create a configuration to isolate Python from the system. For example, to embed Python into an application. This configuration ignores global configuration variables, environment variables, command line arguments ("PyConfig.argv" is not parsed) and user site directory. The C standard streams (ex: "stdout") and the LC_CTYPE locale are left unchanged. Signal handlers are not installed. Configuration files are still used with this configuration to determine paths that are unspecified. Ensure "PyConfig.home" is specified to avoid computing the default path configuration. Python Configuration ==================== "PyPreConfig_InitPythonConfig()" and "PyConfig_InitPythonConfig()" functions create a configuration to build a customized Python which behaves as the regular Python. Environments variables and command line arguments are used to configure Python, whereas global configuration variables are ignored. This function enables C locale coercion (**PEP 538**) and Python UTF-8 Mode (**PEP 540**) depending on the LC_CTYPE locale, "PYTHONUTF8" and "PYTHONCOERCECLOCALE" environment variables. Python Path Configuration ========================= "PyConfig" contains multiple fields for the path configuration: * Path configuration inputs: * "PyConfig.home" * "PyConfig.platlibdir" * "PyConfig.pathconfig_warnings" * "PyConfig.program_name" * "PyConfig.pythonpath_env" * current working directory: to get absolute paths * "PATH" environment variable to get the program full path (from "PyConfig.program_name") * "__PYVENV_LAUNCHER__" environment variable * (Windows only) Application paths in the registry under “SoftwarePythonPythonCoreX.YPythonPath” of HKEY_CURRENT_USER and HKEY_LOCAL_MACHINE (where X.Y is the Python version). * Path configuration output fields: * "PyConfig.base_exec_prefix" * "PyConfig.base_executable" * "PyConfig.base_prefix" * "PyConfig.exec_prefix" * "PyConfig.executable" * "PyConfig.module_search_paths_set", "PyConfig.module_search_paths" * "PyConfig.prefix" If at least one “output field” is not set, Python calculates the path configuration to fill unset fields. If "module_search_paths_set" is equal to "0", "module_search_paths" is overridden and "module_search_paths_set" is set to "1". It is possible to completely ignore the function calculating the default path configuration by setting explicitly all path configuration output fields listed above. A string is considered as set even if it is non-empty. "module_search_paths" is considered as set if "module_search_paths_set" is set to "1". In this case, "module_search_paths" will be used without modification. Set "pathconfig_warnings" to "0" to suppress warnings when calculating the path configuration (Unix only, Windows does not log any warning). If "base_prefix" or "base_exec_prefix" fields are not set, they inherit their value from "prefix" and "exec_prefix" respectively. "Py_RunMain()" and "Py_Main()" modify "sys.path": * If "run_filename" is set and is a directory which contains a "__main__.py" script, prepend "run_filename" to "sys.path". * If "isolated" is zero: * If "run_module" is set, prepend the current directory to "sys.path". Do nothing if the current directory cannot be read. * If "run_filename" is set, prepend the directory of the filename to "sys.path". * Otherwise, prepend an empty string to "sys.path". If "site_import" is non-zero, "sys.path" can be modified by the "site" module. If "user_site_directory" is non-zero and the user’s site- package directory exists, the "site" module appends the user’s site- package directory to "sys.path". The following configuration files are used by the path configuration: * "pyvenv.cfg" * "._pth" file (ex: "python._pth") * "pybuilddir.txt" (Unix only) If a "._pth" file is present: * Set "isolated" to "1". * Set "use_environment" to "0". * Set "site_import" to "0". * Set "safe_path" to "1". The "__PYVENV_LAUNCHER__" environment variable is used to set "PyConfig.base_executable". Py_GetArgcArgv() ================ void Py_GetArgcArgv(int *argc, wchar_t ***argv) Get the original command line arguments, before Python modified them. See also "PyConfig.orig_argv" member. Multi-Phase Initialization Private Provisional API ================================================== This section is a private provisional API introducing multi-phase initialization, the core feature of **PEP 432**: * “Core” initialization phase, “bare minimum Python”: * Builtin types; * Builtin exceptions; * Builtin and frozen modules; * The "sys" module is only partially initialized (ex: "sys.path" doesn’t exist yet). * “Main” initialization phase, Python is fully initialized: * Install and configure "importlib"; * Apply the Path Configuration; * Install signal handlers; * Finish "sys" module initialization (ex: create "sys.stdout" and "sys.path"); * Enable optional features like "faulthandler" and "tracemalloc"; * Import the "site" module; * etc. Private provisional API: * "PyConfig._init_main": if set to "0", "Py_InitializeFromConfig()" stops at the “Core” initialization phase. PyStatus _Py_InitializeMain(void) Move to the “Main” initialization phase, finish the Python initialization. No module is imported during the “Core” phase and the "importlib" module is not configured: the Path Configuration is only applied during the “Main” phase. It may allow to customize Python in Python to override or tune the Path Configuration, maybe install a custom "sys.meta_path" importer or an import hook, etc. It may become possible to calculate the Path Configuration in Python, after the Core phase and before the Main phase, which is one of the **PEP 432** motivation. The “Core” phase is not properly defined: what should be and what should not be available at this phase is not specified yet. The API is marked as private and provisional: the API can be modified or even be removed anytime until a proper public API is designed. Example running Python code between “Core” and “Main” initialization phases: void init_python(void) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); config._init_main = 0; /* ... customize 'config' configuration ... */ status = Py_InitializeFromConfig(&config); PyConfig_Clear(&config); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } /* Use sys.stderr because sys.stdout is only created by _Py_InitializeMain() */ int res = PyRun_SimpleString( "import sys; " "print('Run Python code before _Py_InitializeMain', " "file=sys.stderr)"); if (res < 0) { exit(1); } /* ... put more configuration code here ... */ status = _Py_InitializeMain(); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } } Introduction ************ The Application Programmer’s Interface to Python gives C and C++ programmers access to the Python interpreter at a variety of levels. The API is equally usable from C++, but for brevity it is generally referred to as the Python/C API. There are two fundamentally different reasons for using the Python/C API. The first reason is to write *extension modules* for specific purposes; these are C modules that extend the Python interpreter. This is probably the most common use. The second reason is to use Python as a component in a larger application; this technique is generally referred to as *embedding* Python in an application. Writing an extension module is a relatively well-understood process, where a “cookbook” approach works well. There are several tools that automate the process to some extent. While people have embedded Python in other applications since its early existence, the process of embedding Python is less straightforward than writing an extension. Many API functions are useful independent of whether you’re embedding or extending Python; moreover, most applications that embed Python will need to provide a custom extension as well, so it’s probably a good idea to become familiar with writing an extension before attempting to embed Python in a real application. Coding standards ================ If you’re writing C code for inclusion in CPython, you **must** follow the guidelines and standards defined in **PEP 7**. These guidelines apply regardless of the version of Python you are contributing to. Following these conventions is not necessary for your own third party extension modules, unless you eventually expect to contribute them to Python. Include Files ============= All function, type and macro definitions needed to use the Python/C API are included in your code by the following line: #define PY_SSIZE_T_CLEAN #include This implies inclusion of the following standard headers: "", "", "", "", "" and "" (if available). Note: Since Python may define some pre-processor definitions which affect the standard headers on some systems, you *must* include "Python.h" before any standard headers are included.It is recommended to always define "PY_SSIZE_T_CLEAN" before including "Python.h". See Parsing arguments and building values for a description of this macro. All user visible names defined by Python.h (except those defined by the included standard headers) have one of the prefixes "Py" or "_Py". Names beginning with "_Py" are for internal use by the Python implementation and should not be used by extension writers. Structure member names do not have a reserved prefix. Note: User code should never define names that begin with "Py" or "_Py". This confuses the reader, and jeopardizes the portability of the user code to future Python versions, which may define additional names beginning with one of these prefixes. The header files are typically installed with Python. On Unix, these are located in the directories "*prefix*/include/pythonversion/" and "*exec_prefix*/include/pythonversion/", where "prefix" and "exec_prefix" are defined by the corresponding parameters to Python’s **configure** script and *version* is "'%d.%d' % sys.version_info[:2]". On Windows, the headers are installed in "*prefix*/include", where "prefix" is the installation directory specified to the installer. To include the headers, place both directories (if different) on your compiler’s search path for includes. Do *not* place the parent directories on the search path and then use "#include "; this will break on multi-platform builds since the platform independent headers under "prefix" include the platform specific headers from "exec_prefix". C++ users should note that although the API is defined entirely using C, the header files properly declare the entry points to be "extern "C"". As a result, there is no need to do anything special to use the API from C++. Useful macros ============= Several useful macros are defined in the Python header files. Many are defined closer to where they are useful (e.g. "Py_RETURN_NONE"). Others of a more general utility are defined here. This is not necessarily a complete listing. PyMODINIT_FUNC Declare an extension module "PyInit" initialization function. The function return type is PyObject*. The macro declares any special linkage declarations required by the platform, and for C++ declares the function as "extern "C"". The initialization function must be named "PyInit_*name*", where *name* is the name of the module, and should be the only non-"static" item defined in the module file. Example: static struct PyModuleDef spam_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "spam", ... }; PyMODINIT_FUNC PyInit_spam(void) { return PyModuleDef_Init(&spam_module); } Py_ABS(x) Return the absolute value of "x". Added in version 3.3. Py_ALWAYS_INLINE Ask the compiler to always inline a static inline function. The compiler can ignore it and decide to not inline the function. It can be used to inline performance critical static inline functions when building Python in debug mode with function inlining disabled. For example, MSC disables function inlining when building in debug mode. Marking blindly a static inline function with Py_ALWAYS_INLINE can result in worse performances (due to increased code size for example). The compiler is usually smarter than the developer for the cost/benefit analysis. If Python is built in debug mode (if the "Py_DEBUG" macro is defined), the "Py_ALWAYS_INLINE" macro does nothing. It must be specified before the function return type. Usage: static inline Py_ALWAYS_INLINE int random(void) { return 4; } Added in version 3.11. Py_CHARMASK(c) Argument must be a character or an integer in the range [-128, 127] or [0, 255]. This macro returns "c" cast to an "unsigned char". Py_DEPRECATED(version) Use this for deprecated declarations. The macro must be placed before the symbol name. Example: Py_DEPRECATED(3.8) PyAPI_FUNC(int) Py_OldFunction(void); Changed in version 3.8: MSVC support was added. Py_GETENV(s) Like "getenv(s)", but returns "NULL" if "-E" was passed on the command line (see "PyConfig.use_environment"). Py_MAX(x, y) Return the maximum value between "x" and "y". Added in version 3.3. Py_MEMBER_SIZE(type, member) Return the size of a structure ("type") "member" in bytes. Added in version 3.6. Py_MIN(x, y) Return the minimum value between "x" and "y". Added in version 3.3. Py_NO_INLINE Disable inlining on a function. For example, it reduces the C stack consumption: useful on LTO+PGO builds which heavily inline code (see bpo-33720). Usage: Py_NO_INLINE static int random(void) { return 4; } Added in version 3.11. Py_STRINGIFY(x) Convert "x" to a C string. E.g. "Py_STRINGIFY(123)" returns ""123"". Added in version 3.4. Py_UNREACHABLE() Use this when you have a code path that cannot be reached by design. For example, in the "default:" clause in a "switch" statement for which all possible values are covered in "case" statements. Use this in places where you might be tempted to put an "assert(0)" or "abort()" call. In release mode, the macro helps the compiler to optimize the code, and avoids a warning about unreachable code. For example, the macro is implemented with "__builtin_unreachable()" on GCC in release mode. A use for "Py_UNREACHABLE()" is following a call a function that never returns but that is not declared "_Py_NO_RETURN". If a code path is very unlikely code but can be reached under exceptional case, this macro must not be used. For example, under low memory condition or if a system call returns a value out of the expected range. In this case, it’s better to report the error to the caller. If the error cannot be reported to caller, "Py_FatalError()" can be used. Added in version 3.7. Py_UNUSED(arg) Use this for unused arguments in a function definition to silence compiler warnings. Example: "int func(int a, int Py_UNUSED(b)) { return a; }". Added in version 3.4. PyDoc_STRVAR(name, str) Creates a variable with name "name" that can be used in docstrings. If Python is built without docstrings, the value will be empty. Use "PyDoc_STRVAR" for docstrings to support building Python without docstrings, as specified in **PEP 7**. Example: PyDoc_STRVAR(pop_doc, "Remove and return the rightmost element."); static PyMethodDef deque_methods[] = { // ... {"pop", (PyCFunction)deque_pop, METH_NOARGS, pop_doc}, // ... } PyDoc_STR(str) Creates a docstring for the given input string or an empty string if docstrings are disabled. Use "PyDoc_STR" in specifying docstrings to support building Python without docstrings, as specified in **PEP 7**. Example: static PyMethodDef pysqlite_row_methods[] = { {"keys", (PyCFunction)pysqlite_row_keys, METH_NOARGS, PyDoc_STR("Returns the keys of the row.")}, {NULL, NULL} }; Objects, Types and Reference Counts =================================== Most Python/C API functions have one or more arguments as well as a return value of type PyObject*. This type is a pointer to an opaque data type representing an arbitrary Python object. Since all Python object types are treated the same way by the Python language in most situations (e.g., assignments, scope rules, and argument passing), it is only fitting that they should be represented by a single C type. Almost all Python objects live on the heap: you never declare an automatic or static variable of type "PyObject", only pointer variables of type PyObject* can be declared. The sole exception are the type objects; since these must never be deallocated, they are typically static "PyTypeObject" objects. All Python objects (even Python integers) have a *type* and a *reference count*. An object’s type determines what kind of object it is (e.g., an integer, a list, or a user-defined function; there are many more as explained in The standard type hierarchy). For each of the well-known types there is a macro to check whether an object is of that type; for instance, "PyList_Check(a)" is true if (and only if) the object pointed to by *a* is a Python list. Reference Counts ---------------- The reference count is important because today’s computers have a finite (and often severely limited) memory size; it counts how many different places there are that have a *strong reference* to an object. Such a place could be another object, or a global (or static) C variable, or a local variable in some C function. When the last *strong reference* to an object is released (i.e. its reference count becomes zero), the object is deallocated. If it contains references to other objects, those references are released. Those other objects may be deallocated in turn, if there are no more references to them, and so on. (There’s an obvious problem with objects that reference each other here; for now, the solution is “don’t do that.”) Reference counts are always manipulated explicitly. The normal way is to use the macro "Py_INCREF()" to take a new reference to an object (i.e. increment its reference count by one), and "Py_DECREF()" to release that reference (i.e. decrement the reference count by one). The "Py_DECREF()" macro is considerably more complex than the incref one, since it must check whether the reference count becomes zero and then cause the object’s deallocator to be called. The deallocator is a function pointer contained in the object’s type structure. The type-specific deallocator takes care of releasing references for other objects contained in the object if this is a compound object type, such as a list, as well as performing any additional finalization that’s needed. There’s no chance that the reference count can overflow; at least as many bits are used to hold the reference count as there are distinct memory locations in virtual memory (assuming "sizeof(Py_ssize_t) >= sizeof(void*)"). Thus, the reference count increment is a simple operation. It is not necessary to hold a *strong reference* (i.e. increment the reference count) for every local variable that contains a pointer to an object. In theory, the object’s reference count goes up by one when the variable is made to point to it and it goes down by one when the variable goes out of scope. However, these two cancel each other out, so at the end the reference count hasn’t changed. The only real reason to use the reference count is to prevent the object from being deallocated as long as our variable is pointing to it. If we know that there is at least one other reference to the object that lives at least as long as our variable, there is no need to take a new *strong reference* (i.e. increment the reference count) temporarily. An important situation where this arises is in objects that are passed as arguments to C functions in an extension module that are called from Python; the call mechanism guarantees to hold a reference to every argument for the duration of the call. However, a common pitfall is to extract an object from a list and hold on to it for a while without taking a new reference. Some other operation might conceivably remove the object from the list, releasing that reference, and possibly deallocating it. The real danger is that innocent-looking operations may invoke arbitrary Python code which could do this; there is a code path which allows control to flow back to the user from a "Py_DECREF()", so almost any operation is potentially dangerous. A safe approach is to always use the generic operations (functions whose name begins with "PyObject_", "PyNumber_", "PySequence_" or "PyMapping_"). These operations always create a new *strong reference* (i.e. increment the reference count) of the object they return. This leaves the caller with the responsibility to call "Py_DECREF()" when they are done with the result; this soon becomes second nature. Reference Count Details ~~~~~~~~~~~~~~~~~~~~~~~ The reference count behavior of functions in the Python/C API is best explained in terms of *ownership of references*. Ownership pertains to references, never to objects (objects are not owned: they are always shared). “Owning a reference” means being responsible for calling Py_DECREF on it when the reference is no longer needed. Ownership can also be transferred, meaning that the code that receives ownership of the reference then becomes responsible for eventually releasing it by calling "Py_DECREF()" or "Py_XDECREF()" when it’s no longer needed—or passing on this responsibility (usually to its caller). When a function passes ownership of a reference on to its caller, the caller is said to receive a *new* reference. When no ownership is transferred, the caller is said to *borrow* the reference. Nothing needs to be done for a *borrowed reference*. Conversely, when a calling function passes in a reference to an object, there are two possibilities: the function *steals* a reference to the object, or it does not. *Stealing a reference* means that when you pass a reference to a function, that function assumes that it now owns that reference, and you are not responsible for it any longer. Few functions steal references; the two notable exceptions are "PyList_SetItem()" and "PyTuple_SetItem()", which steal a reference to the item (but not to the tuple or list into which the item is put!). These functions were designed to steal a reference because of a common idiom for populating a tuple or list with newly created objects; for example, the code to create the tuple "(1, 2, "three")" could look like this (forgetting about error handling for the moment; a better way to code this is shown below): PyObject *t; t = PyTuple_New(3); PyTuple_SetItem(t, 0, PyLong_FromLong(1L)); PyTuple_SetItem(t, 1, PyLong_FromLong(2L)); PyTuple_SetItem(t, 2, PyUnicode_FromString("three")); Here, "PyLong_FromLong()" returns a new reference which is immediately stolen by "PyTuple_SetItem()". When you want to keep using an object although the reference to it will be stolen, use "Py_INCREF()" to grab another reference before calling the reference-stealing function. Incidentally, "PyTuple_SetItem()" is the *only* way to set tuple items; "PySequence_SetItem()" and "PyObject_SetItem()" refuse to do this since tuples are an immutable data type. You should only use "PyTuple_SetItem()" for tuples that you are creating yourself. Equivalent code for populating a list can be written using "PyList_New()" and "PyList_SetItem()". However, in practice, you will rarely use these ways of creating and populating a tuple or list. There’s a generic function, "Py_BuildValue()", that can create most common objects from C values, directed by a *format string*. For example, the above two blocks of code could be replaced by the following (which also takes care of the error checking): PyObject *tuple, *list; tuple = Py_BuildValue("(iis)", 1, 2, "three"); list = Py_BuildValue("[iis]", 1, 2, "three"); It is much more common to use "PyObject_SetItem()" and friends with items whose references you are only borrowing, like arguments that were passed in to the function you are writing. In that case, their behaviour regarding references is much saner, since you don’t have to take a new reference just so you can give that reference away (“have it be stolen”). For example, this function sets all items of a list (actually, any mutable sequence) to a given item: int set_all(PyObject *target, PyObject *item) { Py_ssize_t i, n; n = PyObject_Length(target); if (n < 0) return -1; for (i = 0; i < n; i++) { PyObject *index = PyLong_FromSsize_t(i); if (!index) return -1; if (PyObject_SetItem(target, index, item) < 0) { Py_DECREF(index); return -1; } Py_DECREF(index); } return 0; } The situation is slightly different for function return values. While passing a reference to most functions does not change your ownership responsibilities for that reference, many functions that return a reference to an object give you ownership of the reference. The reason is simple: in many cases, the returned object is created on the fly, and the reference you get is the only reference to the object. Therefore, the generic functions that return object references, like "PyObject_GetItem()" and "PySequence_GetItem()", always return a new reference (the caller becomes the owner of the reference). It is important to realize that whether you own a reference returned by a function depends on which function you call only — *the plumage* (the type of the object passed as an argument to the function) *doesn’t enter into it!* Thus, if you extract an item from a list using "PyList_GetItem()", you don’t own the reference — but if you obtain the same item from the same list using "PySequence_GetItem()" (which happens to take exactly the same arguments), you do own a reference to the returned object. Here is an example of how you could write a function that computes the sum of the items in a list of integers; once using "PyList_GetItem()", and once using "PySequence_GetItem()". long sum_list(PyObject *list) { Py_ssize_t i, n; long total = 0, value; PyObject *item; n = PyList_Size(list); if (n < 0) return -1; /* Not a list */ for (i = 0; i < n; i++) { item = PyList_GetItem(list, i); /* Can't fail */ if (!PyLong_Check(item)) continue; /* Skip non-integers */ value = PyLong_AsLong(item); if (value == -1 && PyErr_Occurred()) /* Integer too big to fit in a C long, bail out */ return -1; total += value; } return total; } long sum_sequence(PyObject *sequence) { Py_ssize_t i, n; long total = 0, value; PyObject *item; n = PySequence_Length(sequence); if (n < 0) return -1; /* Has no length */ for (i = 0; i < n; i++) { item = PySequence_GetItem(sequence, i); if (item == NULL) return -1; /* Not a sequence, or other failure */ if (PyLong_Check(item)) { value = PyLong_AsLong(item); Py_DECREF(item); if (value == -1 && PyErr_Occurred()) /* Integer too big to fit in a C long, bail out */ return -1; total += value; } else { Py_DECREF(item); /* Discard reference ownership */ } } return total; } Types ----- There are few other data types that play a significant role in the Python/C API; most are simple C types such as int, long, double and char*. A few structure types are used to describe static tables used to list the functions exported by a module or the data attributes of a new object type, and another is used to describe the value of a complex number. These will be discussed together with the functions that use them. type Py_ssize_t * Part of the Stable ABI.* A signed integral type such that "sizeof(Py_ssize_t) == sizeof(size_t)". C99 doesn’t define such a thing directly (size_t is an unsigned integral type). See **PEP 353** for details. "PY_SSIZE_T_MAX" is the largest positive value of type "Py_ssize_t". Exceptions ========== The Python programmer only needs to deal with exceptions if specific error handling is required; unhandled exceptions are automatically propagated to the caller, then to the caller’s caller, and so on, until they reach the top-level interpreter, where they are reported to the user accompanied by a stack traceback. For C programmers, however, error checking always has to be explicit. All functions in the Python/C API can raise exceptions, unless an explicit claim is made otherwise in a function’s documentation. In general, when a function encounters an error, it sets an exception, discards any object references that it owns, and returns an error indicator. If not documented otherwise, this indicator is either "NULL" or "-1", depending on the function’s return type. A few functions return a Boolean true/false result, with false indicating an error. Very few functions return no explicit error indicator or have an ambiguous return value, and require explicit testing for errors with "PyErr_Occurred()". These exceptions are always explicitly documented. Exception state is maintained in per-thread storage (this is equivalent to using global storage in an unthreaded application). A thread can be in one of two states: an exception has occurred, or not. The function "PyErr_Occurred()" can be used to check for this: it returns a borrowed reference to the exception type object when an exception has occurred, and "NULL" otherwise. There are a number of functions to set the exception state: "PyErr_SetString()" is the most common (though not the most general) function to set the exception state, and "PyErr_Clear()" clears the exception state. The full exception state consists of three objects (all of which can be "NULL"): the exception type, the corresponding exception value, and the traceback. These have the same meanings as the Python result of "sys.exc_info()"; however, they are not the same: the Python objects represent the last exception being handled by a Python "try" … "except" statement, while the C level exception state only exists while an exception is being passed on between C functions until it reaches the Python bytecode interpreter’s main loop, which takes care of transferring it to "sys.exc_info()" and friends. Note that starting with Python 1.5, the preferred, thread-safe way to access the exception state from Python code is to call the function "sys.exc_info()", which returns the per-thread exception state for Python code. Also, the semantics of both ways to access the exception state have changed so that a function which catches an exception will save and restore its thread’s exception state so as to preserve the exception state of its caller. This prevents common bugs in exception handling code caused by an innocent-looking function overwriting the exception being handled; it also reduces the often unwanted lifetime extension for objects that are referenced by the stack frames in the traceback. As a general principle, a function that calls another function to perform some task should check whether the called function raised an exception, and if so, pass the exception state on to its caller. It should discard any object references that it owns, and return an error indicator, but it should *not* set another exception — that would overwrite the exception that was just raised, and lose important information about the exact cause of the error. A simple example of detecting exceptions and passing them on is shown in the "sum_sequence()" example above. It so happens that this example doesn’t need to clean up any owned references when it detects an error. The following example function shows some error cleanup. First, to remind you why you like Python, we show the equivalent Python code: def incr_item(dict, key): try: item = dict[key] except KeyError: item = 0 dict[key] = item + 1 Here is the corresponding C code, in all its glory: int incr_item(PyObject *dict, PyObject *key) { /* Objects all initialized to NULL for Py_XDECREF */ PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL; int rv = -1; /* Return value initialized to -1 (failure) */ item = PyObject_GetItem(dict, key); if (item == NULL) { /* Handle KeyError only: */ if (!PyErr_ExceptionMatches(PyExc_KeyError)) goto error; /* Clear the error and use zero: */ PyErr_Clear(); item = PyLong_FromLong(0L); if (item == NULL) goto error; } const_one = PyLong_FromLong(1L); if (const_one == NULL) goto error; incremented_item = PyNumber_Add(item, const_one); if (incremented_item == NULL) goto error; if (PyObject_SetItem(dict, key, incremented_item) < 0) goto error; rv = 0; /* Success */ /* Continue with cleanup code */ error: /* Cleanup code, shared by success and failure path */ /* Use Py_XDECREF() to ignore NULL references */ Py_XDECREF(item); Py_XDECREF(const_one); Py_XDECREF(incremented_item); return rv; /* -1 for error, 0 for success */ } This example represents an endorsed use of the "goto" statement in C! It illustrates the use of "PyErr_ExceptionMatches()" and "PyErr_Clear()" to handle specific exceptions, and the use of "Py_XDECREF()" to dispose of owned references that may be "NULL" (note the "'X'" in the name; "Py_DECREF()" would crash when confronted with a "NULL" reference). It is important that the variables used to hold owned references are initialized to "NULL" for this to work; likewise, the proposed return value is initialized to "-1" (failure) and only set to success after the final call made is successful. Embedding Python ================ The one important task that only embedders (as opposed to extension writers) of the Python interpreter have to worry about is the initialization, and possibly the finalization, of the Python interpreter. Most functionality of the interpreter can only be used after the interpreter has been initialized. The basic initialization function is "Py_Initialize()". This initializes the table of loaded modules, and creates the fundamental modules "builtins", "__main__", and "sys". It also initializes the module search path ("sys.path"). "Py_Initialize()" does not set the “script argument list” ("sys.argv"). If this variable is needed by Python code that will be executed later, setting "PyConfig.argv" and "PyConfig.parse_argv" must be set: see Python Initialization Configuration. On most systems (in particular, on Unix and Windows, although the details are slightly different), "Py_Initialize()" calculates the module search path based upon its best guess for the location of the standard Python interpreter executable, assuming that the Python library is found in a fixed location relative to the Python interpreter executable. In particular, it looks for a directory named "lib/python*X.Y*" relative to the parent directory where the executable named "python" is found on the shell command search path (the environment variable "PATH"). For instance, if the Python executable is found in "/usr/local/bin/python", it will assume that the libraries are in "/usr/local/lib/python*X.Y*". (In fact, this particular path is also the “fallback” location, used when no executable file named "python" is found along "PATH".) The user can override this behavior by setting the environment variable "PYTHONHOME", or insert additional directories in front of the standard path by setting "PYTHONPATH". The embedding application can steer the search by setting "PyConfig.program_name" *before* calling "Py_InitializeFromConfig()". Note that "PYTHONHOME" still overrides this and "PYTHONPATH" is still inserted in front of the standard path. An application that requires total control has to provide its own implementation of "Py_GetPath()", "Py_GetPrefix()", "Py_GetExecPrefix()", and "Py_GetProgramFullPath()" (all defined in "Modules/getpath.c"). Sometimes, it is desirable to “uninitialize” Python. For instance, the application may want to start over (make another call to "Py_Initialize()") or the application is simply done with its use of Python and wants to free memory allocated by Python. This can be accomplished by calling "Py_FinalizeEx()". The function "Py_IsInitialized()" returns true if Python is currently in the initialized state. More information about these functions is given in a later chapter. Notice that "Py_FinalizeEx()" does *not* free all memory allocated by the Python interpreter, e.g. memory allocated by extension modules currently cannot be released. Debugging Builds ================ Python can be built with several macros to enable extra checks of the interpreter and extension modules. These checks tend to add a large amount of overhead to the runtime so they are not enabled by default. A full list of the various types of debugging builds is in the file "Misc/SpecialBuilds.txt" in the Python source distribution. Builds are available that support tracing of reference counts, debugging the memory allocator, or low-level profiling of the main interpreter loop. Only the most frequently used builds will be described in the remainder of this section. Py_DEBUG Compiling the interpreter with the "Py_DEBUG" macro defined produces what is generally meant by a debug build of Python. "Py_DEBUG" is enabled in the Unix build by adding "--with-pydebug" to the "./configure" command. It is also implied by the presence of the not- Python-specific "_DEBUG" macro. When "Py_DEBUG" is enabled in the Unix build, compiler optimization is disabled. In addition to the reference count debugging described below, extra checks are performed, see Python Debug Build. Defining "Py_TRACE_REFS" enables reference tracing (see the "configure --with-trace-refs option"). When defined, a circular doubly linked list of active objects is maintained by adding two extra fields to every "PyObject". Total allocations are tracked as well. Upon exit, all existing references are printed. (In interactive mode this happens after every statement run by the interpreter.) Please refer to "Misc/SpecialBuilds.txt" in the Python source distribution for more detailed information. Recommended third party tools ============================= The following third party tools offer both simpler and more sophisticated approaches to creating C, C++ and Rust extensions for Python: * Cython * cffi * HPy * nanobind (C++) * Numba * pybind11 (C++) * PyO3 (Rust) * SWIG Using tools such as these can help avoid writing code that is tightly bound to a particular version of CPython, avoid reference counting errors, and focus more on your own code than on using the CPython API. In general, new versions of Python can be supported by updating the tool, and your code will often use newer and more efficient APIs automatically. Some tools also support compiling for other implementations of Python from a single set of sources. These projects are not supported by the same people who maintain Python, and issues need to be raised with the projects directly. Remember to check that the project is still maintained and supported, as the list above may become outdated. See also: Python Packaging User Guide: Binary Extensions The Python Packaging User Guide not only covers several available tools that simplify the creation of binary extensions, but also discusses the various reasons why creating an extension module may be desirable in the first place. Iterator Protocol ***************** There are two functions specifically for working with iterators. int PyIter_Check(PyObject *o) * Part of the Stable ABI since version 3.8.* Return non-zero if the object *o* can be safely passed to "PyIter_Next()", and "0" otherwise. This function always succeeds. int PyAIter_Check(PyObject *o) * Part of the Stable ABI since version 3.10.* Return non-zero if the object *o* provides the "AsyncIterator" protocol, and "0" otherwise. This function always succeeds. Added in version 3.10. PyObject *PyIter_Next(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Return the next value from the iterator *o*. The object must be an iterator according to "PyIter_Check()" (it is up to the caller to check this). If there are no remaining values, returns "NULL" with no exception set. If an error occurs while retrieving the item, returns "NULL" and passes along the exception. To write a loop which iterates over an iterator, the C code should look something like this: PyObject *iterator = PyObject_GetIter(obj); PyObject *item; if (iterator == NULL) { /* propagate error */ } while ((item = PyIter_Next(iterator))) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (PyErr_Occurred()) { /* propagate error */ } else { /* continue doing useful work */ } type PySendResult The enum value used to represent different results of "PyIter_Send()". Added in version 3.10. PySendResult PyIter_Send(PyObject *iter, PyObject *arg, PyObject **presult) * Part of the Stable ABI since version 3.10.* Sends the *arg* value into the iterator *iter*. Returns: * "PYGEN_RETURN" if iterator returns. Return value is returned via *presult*. * "PYGEN_NEXT" if iterator yields. Yielded value is returned via *presult*. * "PYGEN_ERROR" if iterator has raised and exception. *presult* is set to "NULL". Added in version 3.10. Iterator Objects **************** Python provides two general-purpose iterator objects. The first, a sequence iterator, works with an arbitrary sequence supporting the "__getitem__()" method. The second works with a callable object and a sentinel value, calling the callable for each item in the sequence, and ending the iteration when the sentinel value is returned. PyTypeObject PySeqIter_Type * Part of the Stable ABI.* Type object for iterator objects returned by "PySeqIter_New()" and the one-argument form of the "iter()" built-in function for built- in sequence types. int PySeqIter_Check(PyObject *op) Return true if the type of *op* is "PySeqIter_Type". This function always succeeds. PyObject *PySeqIter_New(PyObject *seq) *Return value: New reference.** Part of the Stable ABI.* Return an iterator that works with a general sequence object, *seq*. The iteration ends when the sequence raises "IndexError" for the subscripting operation. PyTypeObject PyCallIter_Type * Part of the Stable ABI.* Type object for iterator objects returned by "PyCallIter_New()" and the two-argument form of the "iter()" built-in function. int PyCallIter_Check(PyObject *op) Return true if the type of *op* is "PyCallIter_Type". This function always succeeds. PyObject *PyCallIter_New(PyObject *callable, PyObject *sentinel) *Return value: New reference.** Part of the Stable ABI.* Return a new iterator. The first parameter, *callable*, can be any Python callable object that can be called with no parameters; each call to it should return the next item in the iteration. When *callable* returns a value equal to *sentinel*, the iteration will be terminated. List Objects ************ type PyListObject This subtype of "PyObject" represents a Python list object. PyTypeObject PyList_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python list type. This is the same object as "list" in the Python layer. int PyList_Check(PyObject *p) Return true if *p* is a list object or an instance of a subtype of the list type. This function always succeeds. int PyList_CheckExact(PyObject *p) Return true if *p* is a list object, but not an instance of a subtype of the list type. This function always succeeds. PyObject *PyList_New(Py_ssize_t len) *Return value: New reference.** Part of the Stable ABI.* Return a new list of length *len* on success, or "NULL" on failure. Note: If *len* is greater than zero, the returned list object’s items are set to "NULL". Thus you cannot use abstract API functions such as "PySequence_SetItem()" or expose the object to Python code before setting all items to a real object with "PyList_SetItem()" or "PyList_SET_ITEM()". The following APIs are safe APIs before the list is fully initialized: "PyList_SetItem()" and "PyList_SET_ITEM()". Py_ssize_t PyList_Size(PyObject *list) * Part of the Stable ABI.* Return the length of the list object in *list*; this is equivalent to "len(list)" on a list object. Py_ssize_t PyList_GET_SIZE(PyObject *list) Similar to "PyList_Size()", but without error checking. PyObject *PyList_GetItemRef(PyObject *list, Py_ssize_t index) *Return value: New reference.** Part of the Stable ABI since version 3.13.* Return the object at position *index* in the list pointed to by *list*. The position must be non-negative; indexing from the end of the list is not supported. If *index* is out of bounds ("<0 or >=len(list)"), return "NULL" and set an "IndexError" exception. Added in version 3.13. PyObject *PyList_GetItem(PyObject *list, Py_ssize_t index) *Return value: Borrowed reference.** Part of the Stable ABI.* Like "PyList_GetItemRef()", but returns a *borrowed reference* instead of a *strong reference*. PyObject *PyList_GET_ITEM(PyObject *list, Py_ssize_t i) *Return value: Borrowed reference.* Similar to "PyList_GetItem()", but without error checking. int PyList_SetItem(PyObject *list, Py_ssize_t index, PyObject *item) * Part of the Stable ABI.* Set the item at index *index* in list to *item*. Return "0" on success. If *index* is out of bounds, return "-1" and set an "IndexError" exception. Note: This function “steals” a reference to *item* and discards a reference to an item already in the list at the affected position. void PyList_SET_ITEM(PyObject *list, Py_ssize_t i, PyObject *o) Macro form of "PyList_SetItem()" without error checking. This is normally only used to fill in new lists where there is no previous content. Bounds checking is performed as an assertion if Python is built in debug mode or "with assertions". Note: This macro “steals” a reference to *item*, and, unlike "PyList_SetItem()", does *not* discard a reference to any item that is being replaced; any reference in *list* at position *i* will be leaked. int PyList_Insert(PyObject *list, Py_ssize_t index, PyObject *item) * Part of the Stable ABI.* Insert the item *item* into list *list* in front of index *index*. Return "0" if successful; return "-1" and set an exception if unsuccessful. Analogous to "list.insert(index, item)". int PyList_Append(PyObject *list, PyObject *item) * Part of the Stable ABI.* Append the object *item* at the end of list *list*. Return "0" if successful; return "-1" and set an exception if unsuccessful. Analogous to "list.append(item)". PyObject *PyList_GetSlice(PyObject *list, Py_ssize_t low, Py_ssize_t high) *Return value: New reference.** Part of the Stable ABI.* Return a list of the objects in *list* containing the objects *between* *low* and *high*. Return "NULL" and set an exception if unsuccessful. Analogous to "list[low:high]". Indexing from the end of the list is not supported. int PyList_SetSlice(PyObject *list, Py_ssize_t low, Py_ssize_t high, PyObject *itemlist) * Part of the Stable ABI.* Set the slice of *list* between *low* and *high* to the contents of *itemlist*. Analogous to "list[low:high] = itemlist". The *itemlist* may be "NULL", indicating the assignment of an empty list (slice deletion). Return "0" on success, "-1" on failure. Indexing from the end of the list is not supported. int PyList_Extend(PyObject *list, PyObject *iterable) Extend *list* with the contents of *iterable*. This is the same as "PyList_SetSlice(list, PY_SSIZE_T_MAX, PY_SSIZE_T_MAX, iterable)" and analogous to "list.extend(iterable)" or "list += iterable". Raise an exception and return "-1" if *list* is not a "list" object. Return 0 on success. Added in version 3.13. int PyList_Clear(PyObject *list) Remove all items from *list*. This is the same as "PyList_SetSlice(list, 0, PY_SSIZE_T_MAX, NULL)" and analogous to "list.clear()" or "del list[:]". Raise an exception and return "-1" if *list* is not a "list" object. Return 0 on success. Added in version 3.13. int PyList_Sort(PyObject *list) * Part of the Stable ABI.* Sort the items of *list* in place. Return "0" on success, "-1" on failure. This is equivalent to "list.sort()". int PyList_Reverse(PyObject *list) * Part of the Stable ABI.* Reverse the items of *list* in place. Return "0" on success, "-1" on failure. This is the equivalent of "list.reverse()". PyObject *PyList_AsTuple(PyObject *list) *Return value: New reference.** Part of the Stable ABI.* Return a new tuple object containing the contents of *list*; equivalent to "tuple(list)". Integer Objects *************** All integers are implemented as “long” integer objects of arbitrary size. On error, most "PyLong_As*" APIs return "(return type)-1" which cannot be distinguished from a number. Use "PyErr_Occurred()" to disambiguate. type PyLongObject * Part of the Limited API (as an opaque struct).* This subtype of "PyObject" represents a Python integer object. PyTypeObject PyLong_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python integer type. This is the same object as "int" in the Python layer. int PyLong_Check(PyObject *p) Return true if its argument is a "PyLongObject" or a subtype of "PyLongObject". This function always succeeds. int PyLong_CheckExact(PyObject *p) Return true if its argument is a "PyLongObject", but not a subtype of "PyLongObject". This function always succeeds. PyObject *PyLong_FromLong(long v) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" object from *v*, or "NULL" on failure. The current implementation keeps an array of integer objects for all integers between "-5" and "256". When you create an int in that range you actually just get back a reference to the existing object. PyObject *PyLong_FromUnsignedLong(unsigned long v) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" object from a C unsigned long, or "NULL" on failure. PyObject *PyLong_FromSsize_t(Py_ssize_t v) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" object from a C "Py_ssize_t", or "NULL" on failure. PyObject *PyLong_FromSize_t(size_t v) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" object from a C "size_t", or "NULL" on failure. PyObject *PyLong_FromLongLong(long long v) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" object from a C long long, or "NULL" on failure. PyObject *PyLong_FromUnsignedLongLong(unsigned long long v) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" object from a C unsigned long long, or "NULL" on failure. PyObject *PyLong_FromDouble(double v) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" object from the integer part of *v*, or "NULL" on failure. PyObject *PyLong_FromString(const char *str, char **pend, int base) *Return value: New reference.** Part of the Stable ABI.* Return a new "PyLongObject" based on the string value in *str*, which is interpreted according to the radix in *base*, or "NULL" on failure. If *pend* is non-"NULL", **pend* will point to the end of *str* on success or to the first character that could not be processed on error. If *base* is "0", *str* is interpreted using the Integer literals definition; in this case, leading zeros in a non-zero decimal number raises a "ValueError". If *base* is not "0", it must be between "2" and "36", inclusive. Leading and trailing whitespace and single underscores after a base specifier and between digits are ignored. If there are no digits or *str* is not NULL-terminated following the digits and trailing whitespace, "ValueError" will be raised. See also: Python methods "int.to_bytes()" and "int.from_bytes()" to convert a "PyLongObject" to/from an array of bytes in base "256". You can call those from C using "PyObject_CallMethod()". PyObject *PyLong_FromUnicodeObject(PyObject *u, int base) *Return value: New reference.* Convert a sequence of Unicode digits in the string *u* to a Python integer value. Added in version 3.3. PyObject *PyLong_FromVoidPtr(void *p) *Return value: New reference.** Part of the Stable ABI.* Create a Python integer from the pointer *p*. The pointer value can be retrieved from the resulting value using "PyLong_AsVoidPtr()". PyObject *PyLong_FromNativeBytes(const void *buffer, size_t n_bytes, int flags) Create a Python integer from the value contained in the first *n_bytes* of *buffer*, interpreted as a two’s-complement signed number. *flags* are as for "PyLong_AsNativeBytes()". Passing "-1" will select the native endian that CPython was compiled with and assume that the most-significant bit is a sign bit. Passing "Py_ASNATIVEBYTES_UNSIGNED_BUFFER" will produce the same result as calling "PyLong_FromUnsignedNativeBytes()". Other flags are ignored. Added in version 3.13. PyObject *PyLong_FromUnsignedNativeBytes(const void *buffer, size_t n_bytes, int flags) Create a Python integer from the value contained in the first *n_bytes* of *buffer*, interpreted as an unsigned number. *flags* are as for "PyLong_AsNativeBytes()". Passing "-1" will select the native endian that CPython was compiled with and assume that the most-significant bit is not a sign bit. Flags other than endian are ignored. Added in version 3.13. long PyLong_AsLong(PyObject *obj) * Part of the Stable ABI.* Return a C long representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" method (if present) to convert it to a "PyLongObject". Raise "OverflowError" if the value of *obj* is out of range for a long. Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Changed in version 3.10: This function will no longer use "__int__()". long PyLong_AS_LONG(PyObject *obj) A *soft deprecated* alias. Exactly equivalent to the preferred "PyLong_AsLong". In particular, it can fail with "OverflowError" or another exception. Deprecated since version 3.14: The function is soft deprecated. int PyLong_AsInt(PyObject *obj) * Part of the Stable ABI since version 3.13.* Similar to "PyLong_AsLong()", but store the result in a C int instead of a C long. Added in version 3.13. long PyLong_AsLongAndOverflow(PyObject *obj, int *overflow) * Part of the Stable ABI.* Return a C long representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is greater than "LONG_MAX" or less than "LONG_MIN", set **overflow* to "1" or "-1", respectively, and return "-1"; otherwise, set **overflow* to "0". If any other exception occurs set **overflow* to "0" and return "-1" as usual. Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Changed in version 3.10: This function will no longer use "__int__()". long long PyLong_AsLongLong(PyObject *obj) * Part of the Stable ABI.* Return a C long long representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" method (if present) to convert it to a "PyLongObject". Raise "OverflowError" if the value of *obj* is out of range for a long long. Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Changed in version 3.10: This function will no longer use "__int__()". long long PyLong_AsLongLongAndOverflow(PyObject *obj, int *overflow) * Part of the Stable ABI.* Return a C long long representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is greater than "LLONG_MAX" or less than "LLONG_MIN", set **overflow* to "1" or "-1", respectively, and return "-1"; otherwise, set **overflow* to "0". If any other exception occurs set **overflow* to "0" and return "-1" as usual. Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. Added in version 3.2. Changed in version 3.8: Use "__index__()" if available. Changed in version 3.10: This function will no longer use "__int__()". Py_ssize_t PyLong_AsSsize_t(PyObject *pylong) * Part of the Stable ABI.* Return a C "Py_ssize_t" representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a "Py_ssize_t". Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. unsigned long PyLong_AsUnsignedLong(PyObject *pylong) * Part of the Stable ABI.* Return a C unsigned long representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a unsigned long. Returns "(unsigned long)-1" on error. Use "PyErr_Occurred()" to disambiguate. size_t PyLong_AsSize_t(PyObject *pylong) * Part of the Stable ABI.* Return a C "size_t" representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a "size_t". Returns "(size_t)-1" on error. Use "PyErr_Occurred()" to disambiguate. unsigned long long PyLong_AsUnsignedLongLong(PyObject *pylong) * Part of the Stable ABI.* Return a C unsigned long long representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for an unsigned long long. Returns "(unsigned long long)-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.1: A negative *pylong* now raises "OverflowError", not "TypeError". unsigned long PyLong_AsUnsignedLongMask(PyObject *obj) * Part of the Stable ABI.* Return a C unsigned long representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is out of range for an unsigned long, return the reduction of that value modulo "ULONG_MAX + 1". Returns "(unsigned long)-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Changed in version 3.10: This function will no longer use "__int__()". unsigned long long PyLong_AsUnsignedLongLongMask(PyObject *obj) * Part of the Stable ABI.* Return a C unsigned long long representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is out of range for an unsigned long long, return the reduction of that value modulo "ULLONG_MAX + 1". Returns "(unsigned long long)-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Changed in version 3.10: This function will no longer use "__int__()". double PyLong_AsDouble(PyObject *pylong) * Part of the Stable ABI.* Return a C double representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a double. Returns "-1.0" on error. Use "PyErr_Occurred()" to disambiguate. void *PyLong_AsVoidPtr(PyObject *pylong) * Part of the Stable ABI.* Convert a Python integer *pylong* to a C void pointer. If *pylong* cannot be converted, an "OverflowError" will be raised. This is only assured to produce a usable void pointer for values created with "PyLong_FromVoidPtr()". Returns "NULL" on error. Use "PyErr_Occurred()" to disambiguate. Py_ssize_t PyLong_AsNativeBytes(PyObject *pylong, void *buffer, Py_ssize_t n_bytes, int flags) Copy the Python integer value *pylong* to a native *buffer* of size *n_bytes*. The *flags* can be set to "-1" to behave similarly to a C cast, or to values documented below to control the behavior. Returns "-1" with an exception raised on error. This may happen if *pylong* cannot be interpreted as an integer, or if *pylong* was negative and the "Py_ASNATIVEBYTES_REJECT_NEGATIVE" flag was set. Otherwise, returns the number of bytes required to store the value. If this is equal to or less than *n_bytes*, the entire value was copied. All *n_bytes* of the buffer are written: large buffers are padded with zeroes. If the returned value is greater than than *n_bytes*, the value was truncated: as many of the lowest bits of the value as could fit are written, and the higher bits are ignored. This matches the typical behavior of a C-style downcast. Note: Overflow is not considered an error. If the returned value is larger than *n_bytes*, most significant bits were discarded. "0" will never be returned. Values are always copied as two’s-complement. Usage example: int32_t value; Py_ssize_t bytes = PyLong_AsNativeBytes(pylong, &value, sizeof(value), -1); if (bytes < 0) { // Failed. A Python exception was set with the reason. return NULL; } else if (bytes <= (Py_ssize_t)sizeof(value)) { // Success! } else { // Overflow occurred, but 'value' contains the truncated // lowest bits of pylong. } Passing zero to *n_bytes* will return the size of a buffer that would be large enough to hold the value. This may be larger than technically necessary, but not unreasonably so. If *n_bytes=0*, *buffer* may be "NULL". Note: Passing *n_bytes=0* to this function is not an accurate way to determine the bit length of the value. To get at the entire Python value of an unknown size, the function can be called twice: first to determine the buffer size, then to fill it: // Ask how much space we need. Py_ssize_t expected = PyLong_AsNativeBytes(pylong, NULL, 0, -1); if (expected < 0) { // Failed. A Python exception was set with the reason. return NULL; } assert(expected != 0); // Impossible per the API definition. uint8_t *bignum = malloc(expected); if (!bignum) { PyErr_SetString(PyExc_MemoryError, "bignum malloc failed."); return NULL; } // Safely get the entire value. Py_ssize_t bytes = PyLong_AsNativeBytes(pylong, bignum, expected, -1); if (bytes < 0) { // Exception has been set. free(bignum); return NULL; } else if (bytes > expected) { // This should not be possible. PyErr_SetString(PyExc_RuntimeError, "Unexpected bignum truncation after a size check."); free(bignum); return NULL; } // The expected success given the above pre-check. // ... use bignum ... free(bignum); *flags* is either "-1" ("Py_ASNATIVEBYTES_DEFAULTS") to select defaults that behave most like a C cast, or a combination of the other flags in the table below. Note that "-1" cannot be combined with other flags. Currently, "-1" corresponds to "Py_ASNATIVEBYTES_NATIVE_ENDIAN | Py_ASNATIVEBYTES_UNSIGNED_BUFFER". +-----------------------------------------------+--------+ | Flag | Value | |===============================================|========| | Py_ASNATIVEBYTES_DEFAULTS | "-1" | +-----------------------------------------------+--------+ | Py_ASNATIVEBYTES_BIG_ENDIAN | "0" | +-----------------------------------------------+--------+ | Py_ASNATIVEBYTES_LITTLE_ENDIAN | "1" | +-----------------------------------------------+--------+ | Py_ASNATIVEBYTES_NATIVE_ENDIAN | "3" | +-----------------------------------------------+--------+ | Py_ASNATIVEBYTES_UNSIGNED_BUFFER | "4" | +-----------------------------------------------+--------+ | Py_ASNATIVEBYTES_REJECT_NEGATIVE | "8" | +-----------------------------------------------+--------+ | Py_ASNATIVEBYTES_ALLOW_INDEX | "16" | +-----------------------------------------------+--------+ Specifying "Py_ASNATIVEBYTES_NATIVE_ENDIAN" will override any other endian flags. Passing "2" is reserved. By default, sufficient buffer will be requested to include a sign bit. For example, when converting 128 with *n_bytes=1*, the function will return 2 (or more) in order to store a zero sign bit. If "Py_ASNATIVEBYTES_UNSIGNED_BUFFER" is specified, a zero sign bit will be omitted from size calculations. This allows, for example, 128 to fit in a single-byte buffer. If the destination buffer is later treated as signed, a positive input value may become negative. Note that the flag does not affect handling of negative values: for those, space for a sign bit is always requested. Specifying "Py_ASNATIVEBYTES_REJECT_NEGATIVE" causes an exception to be set if *pylong* is negative. Without this flag, negative values will be copied provided there is enough space for at least one sign bit, regardless of whether "Py_ASNATIVEBYTES_UNSIGNED_BUFFER" was specified. If "Py_ASNATIVEBYTES_ALLOW_INDEX" is specified and a non-integer value is passed, its "__index__()" method will be called first. This may result in Python code executing and other threads being allowed to run, which could cause changes to other objects or values in use. When *flags* is "-1", this option is not set, and non-integer values will raise "TypeError". Note: With the default *flags* ("-1", or *UNSIGNED_BUFFER* without *REJECT_NEGATIVE*), multiple Python integers can map to a single value without overflow. For example, both "255" and "-1" fit a single-byte buffer and set all its bits. This matches typical C cast behavior. Added in version 3.13. PyObject *PyLong_GetInfo(void) * Part of the Stable ABI.* On success, return a read only *named tuple*, that holds information about Python’s internal representation of integers. See "sys.int_info" for description of individual fields. On failure, return "NULL" with an exception set. Added in version 3.1. int PyUnstable_Long_IsCompact(const PyLongObject *op) *This is Unstable API. It may change without warning in minor releases.* Return 1 if *op* is compact, 0 otherwise. This function makes it possible for performance-critical code to implement a “fast path” for small integers. For compact values use "PyUnstable_Long_CompactValue()"; for others fall back to a "PyLong_As*" function or "PyLong_AsNativeBytes()". The speedup is expected to be negligible for most users. Exactly what values are considered compact is an implementation detail and is subject to change. Added in version 3.12. Py_ssize_t PyUnstable_Long_CompactValue(const PyLongObject *op) *This is Unstable API. It may change without warning in minor releases.* If *op* is compact, as determined by "PyUnstable_Long_IsCompact()", return its value. Otherwise, the return value is undefined. Added in version 3.12. Mapping Protocol **************** See also "PyObject_GetItem()", "PyObject_SetItem()" and "PyObject_DelItem()". int PyMapping_Check(PyObject *o) * Part of the Stable ABI.* Return "1" if the object provides the mapping protocol or supports slicing, and "0" otherwise. Note that it returns "1" for Python classes with a "__getitem__()" method, since in general it is impossible to determine what type of keys the class supports. This function always succeeds. Py_ssize_t PyMapping_Size(PyObject *o) Py_ssize_t PyMapping_Length(PyObject *o) * Part of the Stable ABI.* Returns the number of keys in object *o* on success, and "-1" on failure. This is equivalent to the Python expression "len(o)". PyObject *PyMapping_GetItemString(PyObject *o, const char *key) *Return value: New reference.** Part of the Stable ABI.* This is the same as "PyObject_GetItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. int PyMapping_GetOptionalItem(PyObject *obj, PyObject *key, PyObject **result) * Part of the Stable ABI since version 3.13.* Variant of "PyObject_GetItem()" which doesn’t raise "KeyError" if the key is not found. If the key is found, return "1" and set **result* to a new *strong reference* to the corresponding value. If the key is not found, return "0" and set **result* to "NULL"; the "KeyError" is silenced. If an error other than "KeyError" is raised, return "-1" and set **result* to "NULL". Added in version 3.13. int PyMapping_GetOptionalItemString(PyObject *obj, const char *key, PyObject **result) * Part of the Stable ABI since version 3.13.* This is the same as "PyMapping_GetOptionalItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Added in version 3.13. int PyMapping_SetItemString(PyObject *o, const char *key, PyObject *v) * Part of the Stable ABI.* This is the same as "PyObject_SetItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. int PyMapping_DelItem(PyObject *o, PyObject *key) This is an alias of "PyObject_DelItem()". int PyMapping_DelItemString(PyObject *o, const char *key) This is the same as "PyObject_DelItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. int PyMapping_HasKeyWithError(PyObject *o, PyObject *key) * Part of the Stable ABI since version 3.13.* Return "1" if the mapping object has the key *key* and "0" otherwise. This is equivalent to the Python expression "key in o". On failure, return "-1". Added in version 3.13. int PyMapping_HasKeyStringWithError(PyObject *o, const char *key) * Part of the Stable ABI since version 3.13.* This is the same as "PyMapping_HasKeyWithError()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Added in version 3.13. int PyMapping_HasKey(PyObject *o, PyObject *key) * Part of the Stable ABI.* Return "1" if the mapping object has the key *key* and "0" otherwise. This is equivalent to the Python expression "key in o". This function always succeeds. Note: Exceptions which occur when this calls "__getitem__()" method are silently ignored. For proper error handling, use "PyMapping_HasKeyWithError()", "PyMapping_GetOptionalItem()" or "PyObject_GetItem()" instead. int PyMapping_HasKeyString(PyObject *o, const char *key) * Part of the Stable ABI.* This is the same as "PyMapping_HasKey()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Note: Exceptions that occur when this calls "__getitem__()" method or while creating the temporary "str" object are silently ignored. For proper error handling, use "PyMapping_HasKeyStringWithError()", "PyMapping_GetOptionalItemString()" or "PyMapping_GetItemString()" instead. PyObject *PyMapping_Keys(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* On success, return a list of the keys in object *o*. On failure, return "NULL". Changed in version 3.7: Previously, the function returned a list or a tuple. PyObject *PyMapping_Values(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* On success, return a list of the values in object *o*. On failure, return "NULL". Changed in version 3.7: Previously, the function returned a list or a tuple. PyObject *PyMapping_Items(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* On success, return a list of the items in object *o*, where each item is a tuple containing a key-value pair. On failure, return "NULL". Changed in version 3.7: Previously, the function returned a list or a tuple. Data marshalling support ************************ These routines allow C code to work with serialized objects using the same data format as the "marshal" module. There are functions to write data into the serialization format, and additional functions that can be used to read the data back. Files used to store marshalled data must be opened in binary mode. Numeric values are stored with the least significant byte first. The module supports two versions of the data format: version 0 is the historical version, version 1 shares interned strings in the file, and upon unmarshalling. Version 2 uses a binary format for floating-point numbers. "Py_MARSHAL_VERSION" indicates the current file format (currently 2). void PyMarshal_WriteLongToFile(long value, FILE *file, int version) Marshal a long integer, *value*, to *file*. This will only write the least-significant 32 bits of *value*; regardless of the size of the native long type. *version* indicates the file format. This function can fail, in which case it sets the error indicator. Use "PyErr_Occurred()" to check for that. void PyMarshal_WriteObjectToFile(PyObject *value, FILE *file, int version) Marshal a Python object, *value*, to *file*. *version* indicates the file format. This function can fail, in which case it sets the error indicator. Use "PyErr_Occurred()" to check for that. PyObject *PyMarshal_WriteObjectToString(PyObject *value, int version) *Return value: New reference.* Return a bytes object containing the marshalled representation of *value*. *version* indicates the file format. The following functions allow marshalled values to be read back in. long PyMarshal_ReadLongFromFile(FILE *file) Return a C long from the data stream in a FILE* opened for reading. Only a 32-bit value can be read in using this function, regardless of the native size of long. On error, sets the appropriate exception ("EOFError") and returns "-1". int PyMarshal_ReadShortFromFile(FILE *file) Return a C short from the data stream in a FILE* opened for reading. Only a 16-bit value can be read in using this function, regardless of the native size of short. On error, sets the appropriate exception ("EOFError") and returns "-1". PyObject *PyMarshal_ReadObjectFromFile(FILE *file) *Return value: New reference.* Return a Python object from the data stream in a FILE* opened for reading. On error, sets the appropriate exception ("EOFError", "ValueError" or "TypeError") and returns "NULL". PyObject *PyMarshal_ReadLastObjectFromFile(FILE *file) *Return value: New reference.* Return a Python object from the data stream in a FILE* opened for reading. Unlike "PyMarshal_ReadObjectFromFile()", this function assumes that no further objects will be read from the file, allowing it to aggressively load file data into memory so that the de-serialization can operate from data in memory rather than reading a byte at a time from the file. Only use these variant if you are certain that you won’t be reading anything else from the file. On error, sets the appropriate exception ("EOFError", "ValueError" or "TypeError") and returns "NULL". PyObject *PyMarshal_ReadObjectFromString(const char *data, Py_ssize_t len) *Return value: New reference.* Return a Python object from the data stream in a byte buffer containing *len* bytes pointed to by *data*. On error, sets the appropriate exception ("EOFError", "ValueError" or "TypeError") and returns "NULL". Memory Management ***************** Overview ======== Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the *Python memory manager*. The Python memory manager has different components which deal with various dynamic storage management aspects, like sharing, segmentation, preallocation or caching. At the lowest level, a raw memory allocator ensures that there is enough room in the private heap for storing all Python-related data by interacting with the memory manager of the operating system. On top of the raw memory allocator, several object-specific allocators operate on the same heap and implement distinct memory management policies adapted to the peculiarities of every object type. For example, integer objects are managed differently within the heap than strings, tuples or dictionaries because integers imply different storage requirements and speed/space tradeoffs. The Python memory manager thus delegates some of the work to the object-specific allocators, but ensures that the latter operate within the bounds of the private heap. It is important to understand that the management of the Python heap is performed by the interpreter itself and that the user has no control over it, even if they regularly manipulate object pointers to memory blocks inside that heap. The allocation of heap space for Python objects and other internal buffers is performed on demand by the Python memory manager through the Python/C API functions listed in this document. To avoid memory corruption, extension writers should never try to operate on Python objects with the functions exported by the C library: "malloc()", "calloc()", "realloc()" and "free()". This will result in mixed calls between the C allocator and the Python memory manager with fatal consequences, because they implement different algorithms and operate on different heaps. However, one may safely allocate and release memory blocks with the C library allocator for individual purposes, as shown in the following example: PyObject *res; char *buf = (char *) malloc(BUFSIZ); /* for I/O */ if (buf == NULL) return PyErr_NoMemory(); ...Do some I/O operation involving buf... res = PyBytes_FromString(buf); free(buf); /* malloc'ed */ return res; In this example, the memory request for the I/O buffer is handled by the C library allocator. The Python memory manager is involved only in the allocation of the bytes object returned as a result. In most situations, however, it is recommended to allocate memory from the Python heap specifically because the latter is under control of the Python memory manager. For example, this is required when the interpreter is extended with new object types written in C. Another reason for using the Python heap is the desire to *inform* the Python memory manager about the memory needs of the extension module. Even when the requested memory is used exclusively for internal, highly specific purposes, delegating all memory requests to the Python memory manager causes the interpreter to have a more accurate image of its memory footprint as a whole. Consequently, under certain circumstances, the Python memory manager may or may not trigger appropriate actions, like garbage collection, memory compaction or other preventive procedures. Note that by using the C library allocator as shown in the previous example, the allocated memory for the I/O buffer escapes completely the Python memory manager. See also: The "PYTHONMALLOC" environment variable can be used to configure the memory allocators used by Python. The "PYTHONMALLOCSTATS" environment variable can be used to print statistics of the pymalloc memory allocator every time a new pymalloc object arena is created, and on shutdown. Allocator Domains ================= All allocating functions belong to one of three different “domains” (see also "PyMemAllocatorDomain"). These domains represent different allocation strategies and are optimized for different purposes. The specific details on how every domain allocates memory or what internal functions each domain calls is considered an implementation detail, but for debugging purposes a simplified table can be found at here. The APIs used to allocate and free a block of memory must be from the same domain. For example, "PyMem_Free()" must be used to free memory allocated using "PyMem_Malloc()". The three allocation domains are: * Raw domain: intended for allocating memory for general-purpose memory buffers where the allocation *must* go to the system allocator or where the allocator can operate without the *GIL*. The memory is requested directly from the system. See Raw Memory Interface. * “Mem” domain: intended for allocating memory for Python buffers and general-purpose memory buffers where the allocation must be performed with the *GIL* held. The memory is taken from the Python private heap. See Memory Interface. * Object domain: intended for allocating memory for Python objects. The memory is taken from the Python private heap. See Object allocators. Note: The *free-threaded* build requires that only Python objects are allocated using the “object” domain and that all Python objects are allocated using that domain. This differs from the prior Python versions, where this was only a best practice and not a hard requirement.For example, buffers (non-Python objects) should be allocated using "PyMem_Malloc()", "PyMem_RawMalloc()", or "malloc()", but not "PyObject_Malloc()".See Memory Allocation APIs. Raw Memory Interface ==================== The following function sets are wrappers to the system allocator. These functions are thread-safe, the *GIL* does not need to be held. The default raw memory allocator uses the following functions: "malloc()", "calloc()", "realloc()" and "free()"; call "malloc(1)" (or "calloc(1, 1)") when requesting zero bytes. Added in version 3.4. void *PyMem_RawMalloc(size_t n) * Part of the Stable ABI since version 3.13.* Allocates *n* bytes and returns a pointer of type void* to the allocated memory, or "NULL" if the request fails. Requesting zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_RawMalloc(1)" had been called instead. The memory will not have been initialized in any way. void *PyMem_RawCalloc(size_t nelem, size_t elsize) * Part of the Stable ABI since version 3.13.* Allocates *nelem* elements each whose size in bytes is *elsize* and returns a pointer of type void* to the allocated memory, or "NULL" if the request fails. The memory is initialized to zeros. Requesting zero elements or elements of size zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_RawCalloc(1, 1)" had been called instead. Added in version 3.5. void *PyMem_RawRealloc(void *p, size_t n) * Part of the Stable ABI since version 3.13.* Resizes the memory block pointed to by *p* to *n* bytes. The contents will be unchanged to the minimum of the old and the new sizes. If *p* is "NULL", the call is equivalent to "PyMem_RawMalloc(n)"; else if *n* is equal to zero, the memory block is resized but is not freed, and the returned pointer is non-"NULL". Unless *p* is "NULL", it must have been returned by a previous call to "PyMem_RawMalloc()", "PyMem_RawRealloc()" or "PyMem_RawCalloc()". If the request fails, "PyMem_RawRealloc()" returns "NULL" and *p* remains a valid pointer to the previous memory area. void PyMem_RawFree(void *p) * Part of the Stable ABI since version 3.13.* Frees the memory block pointed to by *p*, which must have been returned by a previous call to "PyMem_RawMalloc()", "PyMem_RawRealloc()" or "PyMem_RawCalloc()". Otherwise, or if "PyMem_RawFree(p)" has been called before, undefined behavior occurs. If *p* is "NULL", no operation is performed. Memory Interface ================ The following function sets, modeled after the ANSI C standard, but specifying behavior when requesting zero bytes, are available for allocating and releasing memory from the Python heap. The default memory allocator uses the pymalloc memory allocator. Warning: The *GIL* must be held when using these functions. Changed in version 3.6: The default allocator is now pymalloc instead of system "malloc()". void *PyMem_Malloc(size_t n) * Part of the Stable ABI.* Allocates *n* bytes and returns a pointer of type void* to the allocated memory, or "NULL" if the request fails. Requesting zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_Malloc(1)" had been called instead. The memory will not have been initialized in any way. void *PyMem_Calloc(size_t nelem, size_t elsize) * Part of the Stable ABI since version 3.7.* Allocates *nelem* elements each whose size in bytes is *elsize* and returns a pointer of type void* to the allocated memory, or "NULL" if the request fails. The memory is initialized to zeros. Requesting zero elements or elements of size zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_Calloc(1, 1)" had been called instead. Added in version 3.5. void *PyMem_Realloc(void *p, size_t n) * Part of the Stable ABI.* Resizes the memory block pointed to by *p* to *n* bytes. The contents will be unchanged to the minimum of the old and the new sizes. If *p* is "NULL", the call is equivalent to "PyMem_Malloc(n)"; else if *n* is equal to zero, the memory block is resized but is not freed, and the returned pointer is non-"NULL". Unless *p* is "NULL", it must have been returned by a previous call to "PyMem_Malloc()", "PyMem_Realloc()" or "PyMem_Calloc()". If the request fails, "PyMem_Realloc()" returns "NULL" and *p* remains a valid pointer to the previous memory area. void PyMem_Free(void *p) * Part of the Stable ABI.* Frees the memory block pointed to by *p*, which must have been returned by a previous call to "PyMem_Malloc()", "PyMem_Realloc()" or "PyMem_Calloc()". Otherwise, or if "PyMem_Free(p)" has been called before, undefined behavior occurs. If *p* is "NULL", no operation is performed. The following type-oriented macros are provided for convenience. Note that *TYPE* refers to any C type. PyMem_New(TYPE, n) Same as "PyMem_Malloc()", but allocates "(n * sizeof(TYPE))" bytes of memory. Returns a pointer cast to "TYPE*". The memory will not have been initialized in any way. PyMem_Resize(p, TYPE, n) Same as "PyMem_Realloc()", but the memory block is resized to "(n * sizeof(TYPE))" bytes. Returns a pointer cast to "TYPE*". On return, *p* will be a pointer to the new memory area, or "NULL" in the event of failure. This is a C preprocessor macro; *p* is always reassigned. Save the original value of *p* to avoid losing memory when handling errors. void PyMem_Del(void *p) Same as "PyMem_Free()". In addition, the following macro sets are provided for calling the Python memory allocator directly, without involving the C API functions listed above. However, note that their use does not preserve binary compatibility across Python versions and is therefore deprecated in extension modules. * "PyMem_MALLOC(size)" * "PyMem_NEW(type, size)" * "PyMem_REALLOC(ptr, size)" * "PyMem_RESIZE(ptr, type, size)" * "PyMem_FREE(ptr)" * "PyMem_DEL(ptr)" Object allocators ================= The following function sets, modeled after the ANSI C standard, but specifying behavior when requesting zero bytes, are available for allocating and releasing memory from the Python heap. Note: There is no guarantee that the memory returned by these allocators can be successfully cast to a Python object when intercepting the allocating functions in this domain by the methods described in the Customize Memory Allocators section. The default object allocator uses the pymalloc memory allocator. Warning: The *GIL* must be held when using these functions. void *PyObject_Malloc(size_t n) * Part of the Stable ABI.* Allocates *n* bytes and returns a pointer of type void* to the allocated memory, or "NULL" if the request fails. Requesting zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyObject_Malloc(1)" had been called instead. The memory will not have been initialized in any way. void *PyObject_Calloc(size_t nelem, size_t elsize) * Part of the Stable ABI since version 3.7.* Allocates *nelem* elements each whose size in bytes is *elsize* and returns a pointer of type void* to the allocated memory, or "NULL" if the request fails. The memory is initialized to zeros. Requesting zero elements or elements of size zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyObject_Calloc(1, 1)" had been called instead. Added in version 3.5. void *PyObject_Realloc(void *p, size_t n) * Part of the Stable ABI.* Resizes the memory block pointed to by *p* to *n* bytes. The contents will be unchanged to the minimum of the old and the new sizes. If *p* is "NULL", the call is equivalent to "PyObject_Malloc(n)"; else if *n* is equal to zero, the memory block is resized but is not freed, and the returned pointer is non-"NULL". Unless *p* is "NULL", it must have been returned by a previous call to "PyObject_Malloc()", "PyObject_Realloc()" or "PyObject_Calloc()". If the request fails, "PyObject_Realloc()" returns "NULL" and *p* remains a valid pointer to the previous memory area. void PyObject_Free(void *p) * Part of the Stable ABI.* Frees the memory block pointed to by *p*, which must have been returned by a previous call to "PyObject_Malloc()", "PyObject_Realloc()" or "PyObject_Calloc()". Otherwise, or if "PyObject_Free(p)" has been called before, undefined behavior occurs. If *p* is "NULL", no operation is performed. Default Memory Allocators ========================= Default memory allocators: +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Configuration | Name | PyMem_RawMalloc | PyMem_Malloc | PyObject_Malloc | |=================================|======================|====================|=======================|======================| | Release build | ""pymalloc"" | "malloc" | "pymalloc" | "pymalloc" | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Debug build | ""pymalloc_debug"" | "malloc" + debug | "pymalloc" + debug | "pymalloc" + debug | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Release build, without pymalloc | ""malloc"" | "malloc" | "malloc" | "malloc" | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Debug build, without pymalloc | ""malloc_debug"" | "malloc" + debug | "malloc" + debug | "malloc" + debug | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ Legend: * Name: value for "PYTHONMALLOC" environment variable. * "malloc": system allocators from the standard C library, C functions: "malloc()", "calloc()", "realloc()" and "free()". * "pymalloc": pymalloc memory allocator. * "mimalloc": mimalloc memory allocator. The pymalloc allocator will be used if mimalloc support isn’t available. * “+ debug”: with debug hooks on the Python memory allocators. * “Debug build”: Python build in debug mode. Customize Memory Allocators =========================== Added in version 3.4. type PyMemAllocatorEx Structure used to describe a memory block allocator. The structure has the following fields: +------------------------------------------------------------+-----------------------------------------+ | Field | Meaning | |============================================================|=========================================| | "void *ctx" | user context passed as first argument | +------------------------------------------------------------+-----------------------------------------+ | "void* malloc(void *ctx, size_t size)" | allocate a memory block | +------------------------------------------------------------+-----------------------------------------+ | "void* calloc(void *ctx, size_t nelem, size_t elsize)" | allocate a memory block initialized | | | with zeros | +------------------------------------------------------------+-----------------------------------------+ | "void* realloc(void *ctx, void *ptr, size_t new_size)" | allocate or resize a memory block | +------------------------------------------------------------+-----------------------------------------+ | "void free(void *ctx, void *ptr)" | free a memory block | +------------------------------------------------------------+-----------------------------------------+ Changed in version 3.5: The "PyMemAllocator" structure was renamed to "PyMemAllocatorEx" and a new "calloc" field was added. type PyMemAllocatorDomain Enum used to identify an allocator domain. Domains: PYMEM_DOMAIN_RAW Functions: * "PyMem_RawMalloc()" * "PyMem_RawRealloc()" * "PyMem_RawCalloc()" * "PyMem_RawFree()" PYMEM_DOMAIN_MEM Functions: * "PyMem_Malloc()", * "PyMem_Realloc()" * "PyMem_Calloc()" * "PyMem_Free()" PYMEM_DOMAIN_OBJ Functions: * "PyObject_Malloc()" * "PyObject_Realloc()" * "PyObject_Calloc()" * "PyObject_Free()" void PyMem_GetAllocator(PyMemAllocatorDomain domain, PyMemAllocatorEx *allocator) Get the memory block allocator of the specified domain. void PyMem_SetAllocator(PyMemAllocatorDomain domain, PyMemAllocatorEx *allocator) Set the memory block allocator of the specified domain. The new allocator must return a distinct non-"NULL" pointer when requesting zero bytes. For the "PYMEM_DOMAIN_RAW" domain, the allocator must be thread- safe: the *GIL* is not held when the allocator is called. For the remaining domains, the allocator must also be thread-safe: the allocator may be called in different interpreters that do not share a "GIL". If the new allocator is not a hook (does not call the previous allocator), the "PyMem_SetupDebugHooks()" function must be called to reinstall the debug hooks on top on the new allocator. See also "PyPreConfig.allocator" and Preinitialize Python with PyPreConfig. Warning: "PyMem_SetAllocator()" does have the following contract: * It can be called after "Py_PreInitialize()" and before "Py_InitializeFromConfig()" to install a custom memory allocator. There are no restrictions over the installed allocator other than the ones imposed by the domain (for instance, the Raw Domain allows the allocator to be called without the GIL held). See the section on allocator domains for more information. * If called after Python has finish initializing (after "Py_InitializeFromConfig()" has been called) the allocator **must** wrap the existing allocator. Substituting the current allocator for some other arbitrary one is **not supported**. Changed in version 3.12: All allocators must be thread-safe. void PyMem_SetupDebugHooks(void) Setup debug hooks in the Python memory allocators to detect memory errors. Debug hooks on the Python memory allocators =========================================== When Python is built in debug mode, the "PyMem_SetupDebugHooks()" function is called at the Python preinitialization to setup debug hooks on Python memory allocators to detect memory errors. The "PYTHONMALLOC" environment variable can be used to install debug hooks on a Python compiled in release mode (ex: "PYTHONMALLOC=debug"). The "PyMem_SetupDebugHooks()" function can be used to set debug hooks after calling "PyMem_SetAllocator()". These debug hooks fill dynamically allocated memory blocks with special, recognizable bit patterns. Newly allocated memory is filled with the byte "0xCD" ("PYMEM_CLEANBYTE"), freed memory is filled with the byte "0xDD" ("PYMEM_DEADBYTE"). Memory blocks are surrounded by “forbidden bytes” filled with the byte "0xFD" ("PYMEM_FORBIDDENBYTE"). Strings of these bytes are unlikely to be valid addresses, floats, or ASCII strings. Runtime checks: * Detect API violations. For example, detect if "PyObject_Free()" is called on a memory block allocated by "PyMem_Malloc()". * Detect write before the start of the buffer (buffer underflow). * Detect write after the end of the buffer (buffer overflow). * Check that the *GIL* is held when allocator functions of "PYMEM_DOMAIN_OBJ" (ex: "PyObject_Malloc()") and "PYMEM_DOMAIN_MEM" (ex: "PyMem_Malloc()") domains are called. On error, the debug hooks use the "tracemalloc" module to get the traceback where a memory block was allocated. The traceback is only displayed if "tracemalloc" is tracing Python memory allocations and the memory block was traced. Let *S* = "sizeof(size_t)". "2*S" bytes are added at each end of each block of *N* bytes requested. The memory layout is like so, where p represents the address returned by a malloc-like or realloc-like function ("p[i:j]" means the slice of bytes from "*(p+i)" inclusive up to "*(p+j)" exclusive; note that the treatment of negative indices differs from a Python slice): "p[-2*S:-S]" Number of bytes originally asked for. This is a size_t, big-endian (easier to read in a memory dump). "p[-S]" API identifier (ASCII character): * "'r'" for "PYMEM_DOMAIN_RAW". * "'m'" for "PYMEM_DOMAIN_MEM". * "'o'" for "PYMEM_DOMAIN_OBJ". "p[-S+1:0]" Copies of PYMEM_FORBIDDENBYTE. Used to catch under- writes and reads. "p[0:N]" The requested memory, filled with copies of PYMEM_CLEANBYTE, used to catch reference to uninitialized memory. When a realloc-like function is called requesting a larger memory block, the new excess bytes are also filled with PYMEM_CLEANBYTE. When a free-like function is called, these are overwritten with PYMEM_DEADBYTE, to catch reference to freed memory. When a realloc- like function is called requesting a smaller memory block, the excess old bytes are also filled with PYMEM_DEADBYTE. "p[N:N+S]" Copies of PYMEM_FORBIDDENBYTE. Used to catch over- writes and reads. "p[N+S:N+2*S]" Only used if the "PYMEM_DEBUG_SERIALNO" macro is defined (not defined by default). A serial number, incremented by 1 on each call to a malloc-like or realloc-like function. Big-endian "size_t". If “bad memory” is detected later, the serial number gives an excellent way to set a breakpoint on the next run, to capture the instant at which this block was passed out. The static function bumpserialno() in obmalloc.c is the only place the serial number is incremented, and exists so you can set such a breakpoint easily. A realloc-like or free-like function first checks that the PYMEM_FORBIDDENBYTE bytes at each end are intact. If they’ve been altered, diagnostic output is written to stderr, and the program is aborted via Py_FatalError(). The other main failure mode is provoking a memory error when a program reads up one of the special bit patterns and tries to use it as an address. If you get in a debugger then and look at the object, you’re likely to see that it’s entirely filled with PYMEM_DEADBYTE (meaning freed memory is getting used) or PYMEM_CLEANBYTE (meaning uninitialized memory is getting used). Changed in version 3.6: The "PyMem_SetupDebugHooks()" function now also works on Python compiled in release mode. On error, the debug hooks now use "tracemalloc" to get the traceback where a memory block was allocated. The debug hooks now also check if the GIL is held when functions of "PYMEM_DOMAIN_OBJ" and "PYMEM_DOMAIN_MEM" domains are called. Changed in version 3.8: Byte patterns "0xCB" ("PYMEM_CLEANBYTE"), "0xDB" ("PYMEM_DEADBYTE") and "0xFB" ("PYMEM_FORBIDDENBYTE") have been replaced with "0xCD", "0xDD" and "0xFD" to use the same values than Windows CRT debug "malloc()" and "free()". The pymalloc allocator ====================== Python has a *pymalloc* allocator optimized for small objects (smaller or equal to 512 bytes) with a short lifetime. It uses memory mappings called “arenas” with a fixed size of either 256 KiB on 32-bit platforms or 1 MiB on 64-bit platforms. It falls back to "PyMem_RawMalloc()" and "PyMem_RawRealloc()" for allocations larger than 512 bytes. *pymalloc* is the default allocator of the "PYMEM_DOMAIN_MEM" (ex: "PyMem_Malloc()") and "PYMEM_DOMAIN_OBJ" (ex: "PyObject_Malloc()") domains. The arena allocator uses the following functions: * "VirtualAlloc()" and "VirtualFree()" on Windows, * "mmap()" and "munmap()" if available, * "malloc()" and "free()" otherwise. This allocator is disabled if Python is configured with the "-- without-pymalloc" option. It can also be disabled at runtime using the "PYTHONMALLOC" environment variable (ex: "PYTHONMALLOC=malloc"). Customize pymalloc Arena Allocator ---------------------------------- Added in version 3.4. type PyObjectArenaAllocator Structure used to describe an arena allocator. The structure has three fields: +----------------------------------------------------+-----------------------------------------+ | Field | Meaning | |====================================================|=========================================| | "void *ctx" | user context passed as first argument | +----------------------------------------------------+-----------------------------------------+ | "void* alloc(void *ctx, size_t size)" | allocate an arena of size bytes | +----------------------------------------------------+-----------------------------------------+ | "void free(void *ctx, void *ptr, size_t size)" | free an arena | +----------------------------------------------------+-----------------------------------------+ void PyObject_GetArenaAllocator(PyObjectArenaAllocator *allocator) Get the arena allocator. void PyObject_SetArenaAllocator(PyObjectArenaAllocator *allocator) Set the arena allocator. The mimalloc allocator ====================== Added in version 3.13. Python supports the mimalloc allocator when the underlying platform support is available. mimalloc “is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leijen for the runtime systems of the Koka and Lean languages.” tracemalloc C API ================= Added in version 3.7. int PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr, size_t size) Track an allocated memory block in the "tracemalloc" module. Return "0" on success, return "-1" on error (failed to allocate memory to store the trace). Return "-2" if tracemalloc is disabled. If memory block is already tracked, update the existing trace. int PyTraceMalloc_Untrack(unsigned int domain, uintptr_t ptr) Untrack an allocated memory block in the "tracemalloc" module. Do nothing if the block was not tracked. Return "-2" if tracemalloc is disabled, otherwise return "0". Examples ======== Here is the example from section Overview, rewritten so that the I/O buffer is allocated from the Python heap by using the first function set: PyObject *res; char *buf = (char *) PyMem_Malloc(BUFSIZ); /* for I/O */ if (buf == NULL) return PyErr_NoMemory(); /* ...Do some I/O operation involving buf... */ res = PyBytes_FromString(buf); PyMem_Free(buf); /* allocated with PyMem_Malloc */ return res; The same code using the type-oriented function set: PyObject *res; char *buf = PyMem_New(char, BUFSIZ); /* for I/O */ if (buf == NULL) return PyErr_NoMemory(); /* ...Do some I/O operation involving buf... */ res = PyBytes_FromString(buf); PyMem_Del(buf); /* allocated with PyMem_New */ return res; Note that in the two examples above, the buffer is always manipulated via functions belonging to the same set. Indeed, it is required to use the same memory API family for a given memory block, so that the risk of mixing different allocators is reduced to a minimum. The following code sequence contains two errors, one of which is labeled as *fatal* because it mixes two different allocators operating on different heaps. char *buf1 = PyMem_New(char, BUFSIZ); char *buf2 = (char *) malloc(BUFSIZ); char *buf3 = (char *) PyMem_Malloc(BUFSIZ); ... PyMem_Del(buf3); /* Wrong -- should be PyMem_Free() */ free(buf2); /* Right -- allocated via malloc() */ free(buf1); /* Fatal -- should be PyMem_Del() */ In addition to the functions aimed at handling raw memory blocks from the Python heap, objects in Python are allocated and released with "PyObject_New", "PyObject_NewVar" and "PyObject_Del()". These will be explained in the next chapter on defining and implementing new object types in C. MemoryView objects ****************** A "memoryview" object exposes the C level buffer interface as a Python object which can then be passed around like any other object. PyObject *PyMemoryView_FromObject(PyObject *obj) *Return value: New reference.** Part of the Stable ABI.* Create a memoryview object from an object that provides the buffer interface. If *obj* supports writable buffer exports, the memoryview object will be read/write, otherwise it may be either read-only or read/write at the discretion of the exporter. PyBUF_READ Flag to request a readonly buffer. PyBUF_WRITE Flag to request a writable buffer. PyObject *PyMemoryView_FromMemory(char *mem, Py_ssize_t size, int flags) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Create a memoryview object using *mem* as the underlying buffer. *flags* can be one of "PyBUF_READ" or "PyBUF_WRITE". Added in version 3.3. PyObject *PyMemoryView_FromBuffer(const Py_buffer *view) *Return value: New reference.** Part of the Stable ABI since version 3.11.* Create a memoryview object wrapping the given buffer structure *view*. For simple byte buffers, "PyMemoryView_FromMemory()" is the preferred function. PyObject *PyMemoryView_GetContiguous(PyObject *obj, int buffertype, char order) *Return value: New reference.** Part of the Stable ABI.* Create a memoryview object to a *contiguous* chunk of memory (in either ‘C’ or ‘F’ortran *order*) from an object that defines the buffer interface. If memory is contiguous, the memoryview object points to the original memory. Otherwise, a copy is made and the memoryview points to a new bytes object. *buffertype* can be one of "PyBUF_READ" or "PyBUF_WRITE". int PyMemoryView_Check(PyObject *obj) Return true if the object *obj* is a memoryview object. It is not currently allowed to create subclasses of "memoryview". This function always succeeds. Py_buffer *PyMemoryView_GET_BUFFER(PyObject *mview) Return a pointer to the memoryview’s private copy of the exporter’s buffer. *mview* **must** be a memoryview instance; this macro doesn’t check its type, you must do it yourself or you will risk crashes. PyObject *PyMemoryView_GET_BASE(PyObject *mview) Return either a pointer to the exporting object that the memoryview is based on or "NULL" if the memoryview has been created by one of the functions "PyMemoryView_FromMemory()" or "PyMemoryView_FromBuffer()". *mview* **must** be a memoryview instance. Instance Method Objects *********************** An instance method is a wrapper for a "PyCFunction" and the new way to bind a "PyCFunction" to a class object. It replaces the former call "PyMethod_New(func, NULL, class)". PyTypeObject PyInstanceMethod_Type This instance of "PyTypeObject" represents the Python instance method type. It is not exposed to Python programs. int PyInstanceMethod_Check(PyObject *o) Return true if *o* is an instance method object (has type "PyInstanceMethod_Type"). The parameter must not be "NULL". This function always succeeds. PyObject *PyInstanceMethod_New(PyObject *func) *Return value: New reference.* Return a new instance method object, with *func* being any callable object. *func* is the function that will be called when the instance method is called. PyObject *PyInstanceMethod_Function(PyObject *im) *Return value: Borrowed reference.* Return the function object associated with the instance method *im*. PyObject *PyInstanceMethod_GET_FUNCTION(PyObject *im) *Return value: Borrowed reference.* Macro version of "PyInstanceMethod_Function()" which avoids error checking. Method Objects ************** Methods are bound function objects. Methods are always bound to an instance of a user-defined class. Unbound methods (methods bound to a class object) are no longer available. PyTypeObject PyMethod_Type This instance of "PyTypeObject" represents the Python method type. This is exposed to Python programs as "types.MethodType". int PyMethod_Check(PyObject *o) Return true if *o* is a method object (has type "PyMethod_Type"). The parameter must not be "NULL". This function always succeeds. PyObject *PyMethod_New(PyObject *func, PyObject *self) *Return value: New reference.* Return a new method object, with *func* being any callable object and *self* the instance the method should be bound. *func* is the function that will be called when the method is called. *self* must not be "NULL". PyObject *PyMethod_Function(PyObject *meth) *Return value: Borrowed reference.* Return the function object associated with the method *meth*. PyObject *PyMethod_GET_FUNCTION(PyObject *meth) *Return value: Borrowed reference.* Macro version of "PyMethod_Function()" which avoids error checking. PyObject *PyMethod_Self(PyObject *meth) *Return value: Borrowed reference.* Return the instance associated with the method *meth*. PyObject *PyMethod_GET_SELF(PyObject *meth) *Return value: Borrowed reference.* Macro version of "PyMethod_Self()" which avoids error checking. Module Objects ************** PyTypeObject PyModule_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python module type. This is exposed to Python programs as "types.ModuleType". int PyModule_Check(PyObject *p) Return true if *p* is a module object, or a subtype of a module object. This function always succeeds. int PyModule_CheckExact(PyObject *p) Return true if *p* is a module object, but not a subtype of "PyModule_Type". This function always succeeds. PyObject *PyModule_NewObject(PyObject *name) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Return a new module object with "module.__name__" set to *name*. The module’s "__name__", "__doc__", "__package__" and "__loader__" attributes are filled in (all but "__name__" are set to "None"). The caller is responsible for setting a "__file__" attribute. Return "NULL" with an exception set on error. Added in version 3.3. Changed in version 3.4: "__package__" and "__loader__" are now set to "None". PyObject *PyModule_New(const char *name) *Return value: New reference.** Part of the Stable ABI.* Similar to "PyModule_NewObject()", but the name is a UTF-8 encoded string instead of a Unicode object. PyObject *PyModule_GetDict(PyObject *module) *Return value: Borrowed reference.** Part of the Stable ABI.* Return the dictionary object that implements *module*’s namespace; this object is the same as the "__dict__" attribute of the module object. If *module* is not a module object (or a subtype of a module object), "SystemError" is raised and "NULL" is returned. It is recommended extensions use other "PyModule_*" and "PyObject_*" functions rather than directly manipulate a module’s "__dict__". PyObject *PyModule_GetNameObject(PyObject *module) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Return *module*’s "__name__" value. If the module does not provide one, or if it is not a string, "SystemError" is raised and "NULL" is returned. Added in version 3.3. const char *PyModule_GetName(PyObject *module) * Part of the Stable ABI.* Similar to "PyModule_GetNameObject()" but return the name encoded to "'utf-8'". void *PyModule_GetState(PyObject *module) * Part of the Stable ABI.* Return the “state” of the module, that is, a pointer to the block of memory allocated at module creation time, or "NULL". See "PyModuleDef.m_size". PyModuleDef *PyModule_GetDef(PyObject *module) * Part of the Stable ABI.* Return a pointer to the "PyModuleDef" struct from which the module was created, or "NULL" if the module wasn’t created from a definition. PyObject *PyModule_GetFilenameObject(PyObject *module) *Return value: New reference.** Part of the Stable ABI.* Return the name of the file from which *module* was loaded using *module*’s "__file__" attribute. If this is not defined, or if it is not a string, raise "SystemError" and return "NULL"; otherwise return a reference to a Unicode object. Added in version 3.2. const char *PyModule_GetFilename(PyObject *module) * Part of the Stable ABI.* Similar to "PyModule_GetFilenameObject()" but return the filename encoded to ‘utf-8’. Deprecated since version 3.2: "PyModule_GetFilename()" raises "UnicodeEncodeError" on unencodable filenames, use "PyModule_GetFilenameObject()" instead. Initializing C modules ====================== Modules objects are usually created from extension modules (shared libraries which export an initialization function), or compiled-in modules (where the initialization function is added using "PyImport_AppendInittab()"). See Building C and C++ Extensions or Extending Embedded Python for details. The initialization function can either pass a module definition instance to "PyModule_Create()", and return the resulting module object, or request “multi-phase initialization” by returning the definition struct itself. type PyModuleDef * Part of the Stable ABI (including all members).* The module definition struct, which holds all information needed to create a module object. There is usually only one statically initialized variable of this type for each module. PyModuleDef_Base m_base Always initialize this member to "PyModuleDef_HEAD_INIT". const char *m_name Name for the new module. const char *m_doc Docstring for the module; usually a docstring variable created with "PyDoc_STRVAR" is used. Py_ssize_t m_size Module state may be kept in a per-module memory area that can be retrieved with "PyModule_GetState()", rather than in static globals. This makes modules safe for use in multiple sub- interpreters. This memory area is allocated based on *m_size* on module creation, and freed when the module object is deallocated, after the "m_free" function has been called, if present. Setting "m_size" to "-1" means that the module does not support sub-interpreters, because it has global state. Setting it to a non-negative value means that the module can be re-initialized and specifies the additional amount of memory it requires for its state. Non-negative "m_size" is required for multi-phase initialization. See **PEP 3121** for more details. PyMethodDef *m_methods A pointer to a table of module-level functions, described by "PyMethodDef" values. Can be "NULL" if no functions are present. PyModuleDef_Slot *m_slots An array of slot definitions for multi-phase initialization, terminated by a "{0, NULL}" entry. When using single-phase initialization, *m_slots* must be "NULL". Changed in version 3.5: Prior to version 3.5, this member was always set to "NULL", and was defined as: inquiry m_reload traverseproc m_traverse A traversal function to call during GC traversal of the module object, or "NULL" if not needed. This function is not called if the module state was requested but is not allocated yet. This is the case immediately after the module is created and before the module is executed ("Py_mod_exec" function). More precisely, this function is not called if "m_size" is greater than 0 and the module state (as returned by "PyModule_GetState()") is "NULL". Changed in version 3.9: No longer called before the module state is allocated. inquiry m_clear A clear function to call during GC clearing of the module object, or "NULL" if not needed. This function is not called if the module state was requested but is not allocated yet. This is the case immediately after the module is created and before the module is executed ("Py_mod_exec" function). More precisely, this function is not called if "m_size" is greater than 0 and the module state (as returned by "PyModule_GetState()") is "NULL". Like "PyTypeObject.tp_clear", this function is not *always* called before a module is deallocated. For example, when reference counting is enough to determine that an object is no longer used, the cyclic garbage collector is not involved and "m_free" is called directly. Changed in version 3.9: No longer called before the module state is allocated. freefunc m_free A function to call during deallocation of the module object, or "NULL" if not needed. This function is not called if the module state was requested but is not allocated yet. This is the case immediately after the module is created and before the module is executed ("Py_mod_exec" function). More precisely, this function is not called if "m_size" is greater than 0 and the module state (as returned by "PyModule_GetState()") is "NULL". Changed in version 3.9: No longer called before the module state is allocated. Single-phase initialization --------------------------- The module initialization function may create and return the module object directly. This is referred to as “single-phase initialization”, and uses one of the following two module creation functions: PyObject *PyModule_Create(PyModuleDef *def) *Return value: New reference.* Create a new module object, given the definition in *def*. This behaves like "PyModule_Create2()" with *module_api_version* set to "PYTHON_API_VERSION". PyObject *PyModule_Create2(PyModuleDef *def, int module_api_version) *Return value: New reference.** Part of the Stable ABI.* Create a new module object, given the definition in *def*, assuming the API version *module_api_version*. If that version does not match the version of the running interpreter, a "RuntimeWarning" is emitted. Return "NULL" with an exception set on error. Note: Most uses of this function should be using "PyModule_Create()" instead; only use this if you are sure you need it. Before it is returned from in the initialization function, the resulting module object is typically populated using functions like "PyModule_AddObjectRef()". Multi-phase initialization -------------------------- An alternate way to specify extensions is to request “multi-phase initialization”. Extension modules created this way behave more like Python modules: the initialization is split between the *creation phase*, when the module object is created, and the *execution phase*, when it is populated. The distinction is similar to the "__new__()" and "__init__()" methods of classes. Unlike modules created using single-phase initialization, these modules are not singletons. For example, if the "sys.modules" entry is removed and the module is re-imported, a new module object is created, and typically populated with fresh method and type objects. The old module is subject to normal garbage collection. This mirrors the behavior of pure-Python modules. Additional module instances may be created in sub-interpreters or after after Python runtime reinitialization ("Py_Finalize()" and "Py_Initialize()"). In these cases, sharing Python objects between module instances would likely cause crashes or undefined behavior. To avoid such issues, each instance of an extension module should be *isolated*: changes to one instance should not implicitly affect the others, and all state, including references to Python objects, should be specific to a particular module instance. See Isolating Extension Modules for more details and a practical guide. A simpler way to avoid these issues is raising an error on repeated initialization. All modules created using multi-phase initialization are expected to support sub-interpreters, or otherwise explicitly signal a lack of support. This is usually achieved by isolation or blocking repeated initialization, as above. A module may also be limited to the main interpreter using the "Py_mod_multiple_interpreters" slot. To request multi-phase initialization, the initialization function (PyInit_modulename) returns a "PyModuleDef" instance with non-empty "m_slots". Before it is returned, the "PyModuleDef" instance must be initialized with the following function: PyObject *PyModuleDef_Init(PyModuleDef *def) *Return value: Borrowed reference.** Part of the Stable ABI since version 3.5.* Ensures a module definition is a properly initialized Python object that correctly reports its type and reference count. Returns *def* cast to "PyObject*", or "NULL" if an error occurred. Added in version 3.5. The *m_slots* member of the module definition must point to an array of "PyModuleDef_Slot" structures: type PyModuleDef_Slot int slot A slot ID, chosen from the available values explained below. void *value Value of the slot, whose meaning depends on the slot ID. Added in version 3.5. The *m_slots* array must be terminated by a slot with id 0. The available slot types are: Py_mod_create Specifies a function that is called to create the module object itself. The *value* pointer of this slot must point to a function of the signature: PyObject *create_module(PyObject *spec, PyModuleDef *def) The function receives a "ModuleSpec" instance, as defined in **PEP 451**, and the module definition. It should return a new module object, or set an error and return "NULL". This function should be kept minimal. In particular, it should not call arbitrary Python code, as trying to import the same module again may result in an infinite loop. Multiple "Py_mod_create" slots may not be specified in one module definition. If "Py_mod_create" is not specified, the import machinery will create a normal module object using "PyModule_New()". The name is taken from *spec*, not the definition, to allow extension modules to dynamically adjust to their place in the module hierarchy and be imported under different names through symlinks, all while sharing a single module definition. There is no requirement for the returned object to be an instance of "PyModule_Type". Any type can be used, as long as it supports setting and getting import-related attributes. However, only "PyModule_Type" instances may be returned if the "PyModuleDef" has non-"NULL" "m_traverse", "m_clear", "m_free"; non-zero "m_size"; or slots other than "Py_mod_create". Py_mod_exec Specifies a function that is called to *execute* the module. This is equivalent to executing the code of a Python module: typically, this function adds classes and constants to the module. The signature of the function is: int exec_module(PyObject *module) If multiple "Py_mod_exec" slots are specified, they are processed in the order they appear in the *m_slots* array. Py_mod_multiple_interpreters Specifies one of the following values: Py_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED The module does not support being imported in subinterpreters. Py_MOD_MULTIPLE_INTERPRETERS_SUPPORTED The module supports being imported in subinterpreters, but only when they share the main interpreter’s GIL. (See Isolating Extension Modules.) Py_MOD_PER_INTERPRETER_GIL_SUPPORTED The module supports being imported in subinterpreters, even when they have their own GIL. (See Isolating Extension Modules.) This slot determines whether or not importing this module in a subinterpreter will fail. Multiple "Py_mod_multiple_interpreters" slots may not be specified in one module definition. If "Py_mod_multiple_interpreters" is not specified, the import machinery defaults to "Py_MOD_MULTIPLE_INTERPRETERS_SUPPORTED". Added in version 3.12. Py_mod_gil Specifies one of the following values: Py_MOD_GIL_USED The module depends on the presence of the global interpreter lock (GIL), and may access global state without synchronization. Py_MOD_GIL_NOT_USED The module is safe to run without an active GIL. This slot is ignored by Python builds not configured with "-- disable-gil". Otherwise, it determines whether or not importing this module will cause the GIL to be automatically enabled. See Free-threaded CPython for more detail. Multiple "Py_mod_gil" slots may not be specified in one module definition. If "Py_mod_gil" is not specified, the import machinery defaults to "Py_MOD_GIL_USED". Added in version 3.13. See **PEP 489** for more details on multi-phase initialization. Low-level module creation functions ----------------------------------- The following functions are called under the hood when using multi- phase initialization. They can be used directly, for example when creating module objects dynamically. Note that both "PyModule_FromDefAndSpec" and "PyModule_ExecDef" must be called to fully initialize a module. PyObject *PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec) *Return value: New reference.* Create a new module object, given the definition in *def* and the ModuleSpec *spec*. This behaves like "PyModule_FromDefAndSpec2()" with *module_api_version* set to "PYTHON_API_VERSION". Added in version 3.5. PyObject *PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec, int module_api_version) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Create a new module object, given the definition in *def* and the ModuleSpec *spec*, assuming the API version *module_api_version*. If that version does not match the version of the running interpreter, a "RuntimeWarning" is emitted. Return "NULL" with an exception set on error. Note: Most uses of this function should be using "PyModule_FromDefAndSpec()" instead; only use this if you are sure you need it. Added in version 3.5. int PyModule_ExecDef(PyObject *module, PyModuleDef *def) * Part of the Stable ABI since version 3.7.* Process any execution slots ("Py_mod_exec") given in *def*. Added in version 3.5. int PyModule_SetDocString(PyObject *module, const char *docstring) * Part of the Stable ABI since version 3.7.* Set the docstring for *module* to *docstring*. This function is called automatically when creating a module from "PyModuleDef", using either "PyModule_Create" or "PyModule_FromDefAndSpec". Added in version 3.5. int PyModule_AddFunctions(PyObject *module, PyMethodDef *functions) * Part of the Stable ABI since version 3.7.* Add the functions from the "NULL" terminated *functions* array to *module*. Refer to the "PyMethodDef" documentation for details on individual entries (due to the lack of a shared module namespace, module level “functions” implemented in C typically receive the module as their first parameter, making them similar to instance methods on Python classes). This function is called automatically when creating a module from "PyModuleDef", using either "PyModule_Create" or "PyModule_FromDefAndSpec". Added in version 3.5. Support functions ----------------- The module initialization function (if using single phase initialization) or a function called from a module execution slot (if using multi-phase initialization), can use the following functions to help initialize the module state: int PyModule_AddObjectRef(PyObject *module, const char *name, PyObject *value) * Part of the Stable ABI since version 3.10.* Add an object to *module* as *name*. This is a convenience function which can be used from the module’s initialization function. On success, return "0". On error, raise an exception and return "-1". Example usage: static int add_spam(PyObject *module, int value) { PyObject *obj = PyLong_FromLong(value); if (obj == NULL) { return -1; } int res = PyModule_AddObjectRef(module, "spam", obj); Py_DECREF(obj); return res; } To be convenient, the function accepts "NULL" *value* with an exception set. In this case, return "-1" and just leave the raised exception unchanged. The example can also be written without checking explicitly if *obj* is "NULL": static int add_spam(PyObject *module, int value) { PyObject *obj = PyLong_FromLong(value); int res = PyModule_AddObjectRef(module, "spam", obj); Py_XDECREF(obj); return res; } Note that "Py_XDECREF()" should be used instead of "Py_DECREF()" in this case, since *obj* can be "NULL". The number of different *name* strings passed to this function should be kept small, usually by only using statically allocated strings as *name*. For names that aren’t known at compile time, prefer calling "PyUnicode_FromString()" and "PyObject_SetAttr()" directly. For more details, see "PyUnicode_InternFromString()", which may be used internally to create a key object. Added in version 3.10. int PyModule_Add(PyObject *module, const char *name, PyObject *value) * Part of the Stable ABI since version 3.13.* Similar to "PyModule_AddObjectRef()", but “steals” a reference to *value*. It can be called with a result of function that returns a new reference without bothering to check its result or even saving it to a variable. Example usage: if (PyModule_Add(module, "spam", PyBytes_FromString(value)) < 0) { goto error; } Added in version 3.13. int PyModule_AddObject(PyObject *module, const char *name, PyObject *value) * Part of the Stable ABI.* Similar to "PyModule_AddObjectRef()", but steals a reference to *value* on success (if it returns "0"). The new "PyModule_Add()" or "PyModule_AddObjectRef()" functions are recommended, since it is easy to introduce reference leaks by misusing the "PyModule_AddObject()" function. Note: Unlike other functions that steal references, "PyModule_AddObject()" only releases the reference to *value* **on success**.This means that its return value must be checked, and calling code must "Py_XDECREF()" *value* manually on error. Example usage: PyObject *obj = PyBytes_FromString(value); if (PyModule_AddObject(module, "spam", obj) < 0) { // If 'obj' is not NULL and PyModule_AddObject() failed, // 'obj' strong reference must be deleted with Py_XDECREF(). // If 'obj' is NULL, Py_XDECREF() does nothing. Py_XDECREF(obj); goto error; } // PyModule_AddObject() stole a reference to obj: // Py_XDECREF(obj) is not needed here. Deprecated since version 3.13: "PyModule_AddObject()" is *soft deprecated*. int PyModule_AddIntConstant(PyObject *module, const char *name, long value) * Part of the Stable ABI.* Add an integer constant to *module* as *name*. This convenience function can be used from the module’s initialization function. Return "-1" with an exception set on error, "0" on success. This is a convenience function that calls "PyLong_FromLong()" and "PyModule_AddObjectRef()"; see their documentation for details. int PyModule_AddStringConstant(PyObject *module, const char *name, const char *value) * Part of the Stable ABI.* Add a string constant to *module* as *name*. This convenience function can be used from the module’s initialization function. The string *value* must be "NULL"-terminated. Return "-1" with an exception set on error, "0" on success. This is a convenience function that calls "PyUnicode_InternFromString()" and "PyModule_AddObjectRef()"; see their documentation for details. PyModule_AddIntMacro(module, macro) Add an int constant to *module*. The name and the value are taken from *macro*. For example "PyModule_AddIntMacro(module, AF_INET)" adds the int constant *AF_INET* with the value of *AF_INET* to *module*. Return "-1" with an exception set on error, "0" on success. PyModule_AddStringMacro(module, macro) Add a string constant to *module*. int PyModule_AddType(PyObject *module, PyTypeObject *type) * Part of the Stable ABI since version 3.10.* Add a type object to *module*. The type object is finalized by calling internally "PyType_Ready()". The name of the type object is taken from the last component of "tp_name" after dot. Return "-1" with an exception set on error, "0" on success. Added in version 3.9. int PyUnstable_Module_SetGIL(PyObject *module, void *gil) *This is Unstable API. It may change without warning in minor releases.* Indicate that *module* does or does not support running without the global interpreter lock (GIL), using one of the values from "Py_mod_gil". It must be called during *module*’s initialization function. If this function is not called during module initialization, the import machinery assumes the module does not support running without the GIL. This function is only available in Python builds configured with "--disable-gil". Return "-1" with an exception set on error, "0" on success. Added in version 3.13. Module lookup ============= Single-phase initialization creates singleton modules that can be looked up in the context of the current interpreter. This allows the module object to be retrieved later with only a reference to the module definition. These functions will not work on modules created using multi-phase initialization, since multiple such modules can be created from a single definition. PyObject *PyState_FindModule(PyModuleDef *def) *Return value: Borrowed reference.** Part of the Stable ABI.* Returns the module object that was created from *def* for the current interpreter. This method requires that the module object has been attached to the interpreter state with "PyState_AddModule()" beforehand. In case the corresponding module object is not found or has not been attached to the interpreter state yet, it returns "NULL". int PyState_AddModule(PyObject *module, PyModuleDef *def) * Part of the Stable ABI since version 3.3.* Attaches the module object passed to the function to the interpreter state. This allows the module object to be accessible via "PyState_FindModule()". Only effective on modules created using single-phase initialization. Python calls "PyState_AddModule" automatically after importing a module, so it is unnecessary (but harmless) to call it from module initialization code. An explicit call is needed only if the module’s own init code subsequently calls "PyState_FindModule". The function is mainly intended for implementing alternative import mechanisms (either by calling it directly, or by referring to its implementation for details of the required state updates). The caller must hold the GIL. Return "-1" with an exception set on error, "0" on success. Added in version 3.3. int PyState_RemoveModule(PyModuleDef *def) * Part of the Stable ABI since version 3.3.* Removes the module object created from *def* from the interpreter state. Return "-1" with an exception set on error, "0" on success. The caller must hold the GIL. Added in version 3.3. Monitoring C API **************** Added in version 3.13. An extension may need to interact with the event monitoring system. Subscribing to events and registering callbacks can be done via the Python API exposed in "sys.monitoring". Generating Execution Events *************************** The functions below make it possible for an extension to fire monitoring events as it emulates the execution of Python code. Each of these functions accepts a "PyMonitoringState" struct which contains concise information about the activation state of events, as well as the event arguments, which include a "PyObject*" representing the code object, the instruction offset and sometimes additional, event- specific arguments (see "sys.monitoring" for details about the signatures of the different event callbacks). The "codelike" argument should be an instance of "types.CodeType" or of a type that emulates it. The VM disables tracing when firing an event, so there is no need for user code to do that. Monitoring functions should not be called with an exception set, except those listed below as working with the current exception. type PyMonitoringState Representation of the state of an event type. It is allocated by the user while its contents are maintained by the monitoring API functions described below. All of the functions below return 0 on success and -1 (with an exception set) on error. See "sys.monitoring" for descriptions of the events. int PyMonitoring_FirePyStartEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire a "PY_START" event. int PyMonitoring_FirePyResumeEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire a "PY_RESUME" event. int PyMonitoring_FirePyReturnEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, PyObject *retval) Fire a "PY_RETURN" event. int PyMonitoring_FirePyYieldEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, PyObject *retval) Fire a "PY_YIELD" event. int PyMonitoring_FireCallEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, PyObject *callable, PyObject *arg0) Fire a "CALL" event. int PyMonitoring_FireLineEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, int lineno) Fire a "LINE" event. int PyMonitoring_FireJumpEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, PyObject *target_offset) Fire a "JUMP" event. int PyMonitoring_FireBranchEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, PyObject *target_offset) Fire a "BRANCH" event. int PyMonitoring_FireCReturnEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, PyObject *retval) Fire a "C_RETURN" event. int PyMonitoring_FirePyThrowEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire a "PY_THROW" event with the current exception (as returned by "PyErr_GetRaisedException()"). int PyMonitoring_FireRaiseEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire a "RAISE" event with the current exception (as returned by "PyErr_GetRaisedException()"). int PyMonitoring_FireCRaiseEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire a "C_RAISE" event with the current exception (as returned by "PyErr_GetRaisedException()"). int PyMonitoring_FireReraiseEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire a "RERAISE" event with the current exception (as returned by "PyErr_GetRaisedException()"). int PyMonitoring_FireExceptionHandledEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire an "EXCEPTION_HANDLED" event with the current exception (as returned by "PyErr_GetRaisedException()"). int PyMonitoring_FirePyUnwindEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset) Fire a "PY_UNWIND" event with the current exception (as returned by "PyErr_GetRaisedException()"). int PyMonitoring_FireStopIterationEvent(PyMonitoringState *state, PyObject *codelike, int32_t offset, PyObject *value) Fire a "STOP_ITERATION" event. If "value" is an instance of "StopIteration", it is used. Otherwise, a new "StopIteration" instance is created with "value" as its argument. Managing the Monitoring State ============================= Monitoring states can be managed with the help of monitoring scopes. A scope would typically correspond to a python function. int PyMonitoring_EnterScope(PyMonitoringState *state_array, uint64_t *version, const uint8_t *event_types, Py_ssize_t length) Enter a monitored scope. "event_types" is an array of the event IDs for events that may be fired from the scope. For example, the ID of a "PY_START" event is the value "PY_MONITORING_EVENT_PY_START", which is numerically equal to the base-2 logarithm of "sys.monitoring.events.PY_START". "state_array" is an array with a monitoring state entry for each event in "event_types", it is allocated by the user but populated by "PyMonitoring_EnterScope()" with information about the activation state of the event. The size of "event_types" (and hence also of "state_array") is given in "length". The "version" argument is a pointer to a value which should be allocated by the user together with "state_array" and initialized to 0, and then set only by "PyMonitoring_EnterScope()" itself. It allows this function to determine whether event states have changed since the previous call, and to return quickly if they have not. The scopes referred to here are lexical scopes: a function, class or method. "PyMonitoring_EnterScope()" should be called whenever the lexical scope is entered. Scopes can be reentered, reusing the same *state_array* and *version*, in situations like when emulating a recursive Python function. When a code-like’s execution is paused, such as when emulating a generator, the scope needs to be exited and re-entered. The macros for *event_types* are: +----------------------------------------------------+---------------------------------------+ | Macro | Event | |====================================================|=======================================| | PY_MONITORING_EVENT_BRANCH | "BRANCH" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_CALL | "CALL" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_C_RAISE | "C_RAISE" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_C_RETURN | "C_RETURN" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_EXCEPTION_HANDLED | "EXCEPTION_HANDLED" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_INSTRUCTION | "INSTRUCTION" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_JUMP | "JUMP" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_LINE | "LINE" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_PY_RESUME | "PY_RESUME" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_PY_RETURN | "PY_RETURN" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_PY_START | "PY_START" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_PY_THROW | "PY_THROW" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_PY_UNWIND | "PY_UNWIND" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_PY_YIELD | "PY_YIELD" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_RAISE | "RAISE" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_RERAISE | "RERAISE" | +----------------------------------------------------+---------------------------------------+ | PY_MONITORING_EVENT_STOP_ITERATION | "STOP_ITERATION" | +----------------------------------------------------+---------------------------------------+ int PyMonitoring_ExitScope(void) Exit the last scope that was entered with "PyMonitoring_EnterScope()". int PY_MONITORING_IS_INSTRUMENTED_EVENT(uint8_t ev) Return true if the event corresponding to the event ID *ev* is a local event. Added in version 3.13. Deprecated since version 3.13.3: This function is *soft deprecated*. The "None" Object ***************** Note that the "PyTypeObject" for "None" is not directly exposed in the Python/C API. Since "None" is a singleton, testing for object identity (using "==" in C) is sufficient. There is no "PyNone_Check()" function for the same reason. PyObject *Py_None The Python "None" object, denoting lack of value. This object has no methods and is *immortal*. Changed in version 3.12: "Py_None" is *immortal*. Py_RETURN_NONE Return "Py_None" from a function. Number Protocol *************** int PyNumber_Check(PyObject *o) * Part of the Stable ABI.* Returns "1" if the object *o* provides numeric protocols, and false otherwise. This function always succeeds. Changed in version 3.8: Returns "1" if *o* is an index integer. PyObject *PyNumber_Add(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of adding *o1* and *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 + o2". PyObject *PyNumber_Subtract(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of subtracting *o2* from *o1*, or "NULL" on failure. This is the equivalent of the Python expression "o1 - o2". PyObject *PyNumber_Multiply(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of multiplying *o1* and *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 * o2". PyObject *PyNumber_MatrixMultiply(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Returns the result of matrix multiplication on *o1* and *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 @ o2". Added in version 3.5. PyObject *PyNumber_FloorDivide(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Return the floor of *o1* divided by *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 // o2". PyObject *PyNumber_TrueDivide(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Return a reasonable approximation for the mathematical value of *o1* divided by *o2*, or "NULL" on failure. The return value is “approximate” because binary floating-point numbers are approximate; it is not possible to represent all real numbers in base two. This function can return a floating-point value when passed two integers. This is the equivalent of the Python expression "o1 / o2". PyObject *PyNumber_Remainder(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the remainder of dividing *o1* by *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 % o2". PyObject *PyNumber_Divmod(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* See the built-in function "divmod()". Returns "NULL" on failure. This is the equivalent of the Python expression "divmod(o1, o2)". PyObject *PyNumber_Power(PyObject *o1, PyObject *o2, PyObject *o3) *Return value: New reference.** Part of the Stable ABI.* See the built-in function "pow()". Returns "NULL" on failure. This is the equivalent of the Python expression "pow(o1, o2, o3)", where *o3* is optional. If *o3* is to be ignored, pass "Py_None" in its place (passing "NULL" for *o3* would cause an illegal memory access). PyObject *PyNumber_Negative(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Returns the negation of *o* on success, or "NULL" on failure. This is the equivalent of the Python expression "-o". PyObject *PyNumber_Positive(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Returns *o* on success, or "NULL" on failure. This is the equivalent of the Python expression "+o". PyObject *PyNumber_Absolute(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Returns the absolute value of *o*, or "NULL" on failure. This is the equivalent of the Python expression "abs(o)". PyObject *PyNumber_Invert(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Returns the bitwise negation of *o* on success, or "NULL" on failure. This is the equivalent of the Python expression "~o". PyObject *PyNumber_Lshift(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of left shifting *o1* by *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 << o2". PyObject *PyNumber_Rshift(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of right shifting *o1* by *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 >> o2". PyObject *PyNumber_And(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the “bitwise and” of *o1* and *o2* on success and "NULL" on failure. This is the equivalent of the Python expression "o1 & o2". PyObject *PyNumber_Xor(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the “bitwise exclusive or” of *o1* by *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 ^ o2". PyObject *PyNumber_Or(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the “bitwise or” of *o1* and *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 | o2". PyObject *PyNumber_InPlaceAdd(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of adding *o1* and *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 += o2". PyObject *PyNumber_InPlaceSubtract(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of subtracting *o2* from *o1*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 -= o2". PyObject *PyNumber_InPlaceMultiply(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of multiplying *o1* and *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 *= o2". PyObject *PyNumber_InPlaceMatrixMultiply(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Returns the result of matrix multiplication on *o1* and *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 @= o2". Added in version 3.5. PyObject *PyNumber_InPlaceFloorDivide(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the mathematical floor of dividing *o1* by *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 //= o2". PyObject *PyNumber_InPlaceTrueDivide(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Return a reasonable approximation for the mathematical value of *o1* divided by *o2*, or "NULL" on failure. The return value is “approximate” because binary floating-point numbers are approximate; it is not possible to represent all real numbers in base two. This function can return a floating-point value when passed two integers. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 /= o2". PyObject *PyNumber_InPlaceRemainder(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the remainder of dividing *o1* by *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 %= o2". PyObject *PyNumber_InPlacePower(PyObject *o1, PyObject *o2, PyObject *o3) *Return value: New reference.** Part of the Stable ABI.* See the built-in function "pow()". Returns "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 **= o2" when o3 is "Py_None", or an in-place variant of "pow(o1, o2, o3)" otherwise. If *o3* is to be ignored, pass "Py_None" in its place (passing "NULL" for *o3* would cause an illegal memory access). PyObject *PyNumber_InPlaceLshift(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of left shifting *o1* by *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 <<= o2". PyObject *PyNumber_InPlaceRshift(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the result of right shifting *o1* by *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 >>= o2". PyObject *PyNumber_InPlaceAnd(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the “bitwise and” of *o1* and *o2* on success and "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 &= o2". PyObject *PyNumber_InPlaceXor(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the “bitwise exclusive or” of *o1* by *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 ^= o2". PyObject *PyNumber_InPlaceOr(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Returns the “bitwise or” of *o1* and *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 |= o2". PyObject *PyNumber_Long(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Returns the *o* converted to an integer object on success, or "NULL" on failure. This is the equivalent of the Python expression "int(o)". PyObject *PyNumber_Float(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Returns the *o* converted to a float object on success, or "NULL" on failure. This is the equivalent of the Python expression "float(o)". PyObject *PyNumber_Index(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Returns the *o* converted to a Python int on success or "NULL" with a "TypeError" exception raised on failure. Changed in version 3.10: The result always has exact type "int". Previously, the result could have been an instance of a subclass of "int". PyObject *PyNumber_ToBase(PyObject *n, int base) *Return value: New reference.** Part of the Stable ABI.* Returns the integer *n* converted to base *base* as a string. The *base* argument must be one of 2, 8, 10, or 16. For base 2, 8, or 16, the returned string is prefixed with a base marker of "'0b'", "'0o'", or "'0x'", respectively. If *n* is not a Python int, it is converted with "PyNumber_Index()" first. Py_ssize_t PyNumber_AsSsize_t(PyObject *o, PyObject *exc) * Part of the Stable ABI.* Returns *o* converted to a "Py_ssize_t" value if *o* can be interpreted as an integer. If the call fails, an exception is raised and "-1" is returned. If *o* can be converted to a Python int but the attempt to convert to a "Py_ssize_t" value would raise an "OverflowError", then the *exc* argument is the type of exception that will be raised (usually "IndexError" or "OverflowError"). If *exc* is "NULL", then the exception is cleared and the value is clipped to "PY_SSIZE_T_MIN" for a negative integer or "PY_SSIZE_T_MAX" for a positive integer. int PyIndex_Check(PyObject *o) * Part of the Stable ABI since version 3.8.* Returns "1" if *o* is an index integer (has the "nb_index" slot of the "tp_as_number" structure filled in), and "0" otherwise. This function always succeeds. Object Protocol *************** PyObject *Py_GetConstant(unsigned int constant_id) * Part of the Stable ABI since version 3.13.* Get a *strong reference* to a constant. Set an exception and return "NULL" if *constant_id* is invalid. *constant_id* must be one of these constant identifiers: +------------------------------------------+-------+---------------------------+ | Constant Identifier | Value | Returned object | |==========================================|=======|===========================| | Py_CONSTANT_NONE | "0" | "None" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_FALSE | "1" | "False" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_TRUE | "2" | "True" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_ELLIPSIS | "3" | "Ellipsis" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_NOT_IMPLEMENTED | "4" | "NotImplemented" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_ZERO | "5" | "0" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_ONE | "6" | "1" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_EMPTY_STR | "7" | "''" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_EMPTY_BYTES | "8" | "b''" | +------------------------------------------+-------+---------------------------+ | Py_CONSTANT_EMPTY_TUPLE | "9" | "()" | +------------------------------------------+-------+---------------------------+ Numeric values are only given for projects which cannot use the constant identifiers. Added in version 3.13. **CPython implementation detail:** In CPython, all of these constants are *immortal*. PyObject *Py_GetConstantBorrowed(unsigned int constant_id) * Part of the Stable ABI since version 3.13.* Similar to "Py_GetConstant()", but return a *borrowed reference*. This function is primarily intended for backwards compatibility: using "Py_GetConstant()" is recommended for new code. The reference is borrowed from the interpreter, and is valid until the interpreter finalization. Added in version 3.13. PyObject *Py_NotImplemented The "NotImplemented" singleton, used to signal that an operation is not implemented for the given type combination. Py_RETURN_NOTIMPLEMENTED Properly handle returning "Py_NotImplemented" from within a C function (that is, create a new *strong reference* to "NotImplemented" and return it). Py_PRINT_RAW Flag to be used with multiple functions that print the object (like "PyObject_Print()" and "PyFile_WriteObject()"). If passed, these function would use the "str()" of the object instead of the "repr()". int PyObject_Print(PyObject *o, FILE *fp, int flags) Print an object *o*, on file *fp*. Returns "-1" on error. The flags argument is used to enable certain printing options. The only option currently supported is "Py_PRINT_RAW"; if given, the "str()" of the object is written instead of the "repr()". int PyObject_HasAttrWithError(PyObject *o, PyObject *attr_name) * Part of the Stable ABI since version 3.13.* Returns "1" if *o* has the attribute *attr_name*, and "0" otherwise. This is equivalent to the Python expression "hasattr(o, attr_name)". On failure, return "-1". Added in version 3.13. int PyObject_HasAttrStringWithError(PyObject *o, const char *attr_name) * Part of the Stable ABI since version 3.13.* This is the same as "PyObject_HasAttrWithError()", but *attr_name* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Added in version 3.13. int PyObject_HasAttr(PyObject *o, PyObject *attr_name) * Part of the Stable ABI.* Returns "1" if *o* has the attribute *attr_name*, and "0" otherwise. This function always succeeds. Note: Exceptions that occur when this calls "__getattr__()" and "__getattribute__()" methods aren’t propagated, but instead given to "sys.unraisablehook()". For proper error handling, use "PyObject_HasAttrWithError()", "PyObject_GetOptionalAttr()" or "PyObject_GetAttr()" instead. int PyObject_HasAttrString(PyObject *o, const char *attr_name) * Part of the Stable ABI.* This is the same as "PyObject_HasAttr()", but *attr_name* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Note: Exceptions that occur when this calls "__getattr__()" and "__getattribute__()" methods or while creating the temporary "str" object are silently ignored. For proper error handling, use "PyObject_HasAttrStringWithError()", "PyObject_GetOptionalAttrString()" or "PyObject_GetAttrString()" instead. PyObject *PyObject_GetAttr(PyObject *o, PyObject *attr_name) *Return value: New reference.** Part of the Stable ABI.* Retrieve an attribute named *attr_name* from object *o*. Returns the attribute value on success, or "NULL" on failure. This is the equivalent of the Python expression "o.attr_name". If the missing attribute should not be treated as a failure, you can use "PyObject_GetOptionalAttr()" instead. PyObject *PyObject_GetAttrString(PyObject *o, const char *attr_name) *Return value: New reference.** Part of the Stable ABI.* This is the same as "PyObject_GetAttr()", but *attr_name* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. If the missing attribute should not be treated as a failure, you can use "PyObject_GetOptionalAttrString()" instead. int PyObject_GetOptionalAttr(PyObject *obj, PyObject *attr_name, PyObject **result); * Part of the Stable ABI since version 3.13.* Variant of "PyObject_GetAttr()" which doesn’t raise "AttributeError" if the attribute is not found. If the attribute is found, return "1" and set **result* to a new *strong reference* to the attribute. If the attribute is not found, return "0" and set **result* to "NULL"; the "AttributeError" is silenced. If an error other than "AttributeError" is raised, return "-1" and set **result* to "NULL". Added in version 3.13. int PyObject_GetOptionalAttrString(PyObject *obj, const char *attr_name, PyObject **result); * Part of the Stable ABI since version 3.13.* This is the same as "PyObject_GetOptionalAttr()", but *attr_name* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. Added in version 3.13. PyObject *PyObject_GenericGetAttr(PyObject *o, PyObject *name) *Return value: New reference.** Part of the Stable ABI.* Generic attribute getter function that is meant to be put into a type object’s "tp_getattro" slot. It looks for a descriptor in the dictionary of classes in the object’s MRO as well as an attribute in the object’s "__dict__" (if present). As outlined in Implementing Descriptors, data descriptors take preference over instance attributes, while non-data descriptors don’t. Otherwise, an "AttributeError" is raised. int PyObject_SetAttr(PyObject *o, PyObject *attr_name, PyObject *v) * Part of the Stable ABI.* Set the value of the attribute named *attr_name*, for object *o*, to the value *v*. Raise an exception and return "-1" on failure; return "0" on success. This is the equivalent of the Python statement "o.attr_name = v". If *v* is "NULL", the attribute is deleted. This behaviour is deprecated in favour of using "PyObject_DelAttr()", but there are currently no plans to remove it. int PyObject_SetAttrString(PyObject *o, const char *attr_name, PyObject *v) * Part of the Stable ABI.* This is the same as "PyObject_SetAttr()", but *attr_name* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. If *v* is "NULL", the attribute is deleted, but this feature is deprecated in favour of using "PyObject_DelAttrString()". The number of different attribute names passed to this function should be kept small, usually by using a statically allocated string as *attr_name*. For attribute names that aren’t known at compile time, prefer calling "PyUnicode_FromString()" and "PyObject_SetAttr()" directly. For more details, see "PyUnicode_InternFromString()", which may be used internally to create a key object. int PyObject_GenericSetAttr(PyObject *o, PyObject *name, PyObject *value) * Part of the Stable ABI.* Generic attribute setter and deleter function that is meant to be put into a type object’s "tp_setattro" slot. It looks for a data descriptor in the dictionary of classes in the object’s MRO, and if found it takes preference over setting or deleting the attribute in the instance dictionary. Otherwise, the attribute is set or deleted in the object’s "__dict__" (if present). On success, "0" is returned, otherwise an "AttributeError" is raised and "-1" is returned. int PyObject_DelAttr(PyObject *o, PyObject *attr_name) * Part of the Stable ABI since version 3.13.* Delete attribute named *attr_name*, for object *o*. Returns "-1" on failure. This is the equivalent of the Python statement "del o.attr_name". int PyObject_DelAttrString(PyObject *o, const char *attr_name) * Part of the Stable ABI since version 3.13.* This is the same as "PyObject_DelAttr()", but *attr_name* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. The number of different attribute names passed to this function should be kept small, usually by using a statically allocated string as *attr_name*. For attribute names that aren’t known at compile time, prefer calling "PyUnicode_FromString()" and "PyObject_DelAttr()" directly. For more details, see "PyUnicode_InternFromString()", which may be used internally to create a key object for lookup. PyObject *PyObject_GenericGetDict(PyObject *o, void *context) *Return value: New reference.** Part of the Stable ABI since version 3.10.* A generic implementation for the getter of a "__dict__" descriptor. It creates the dictionary if necessary. This function may also be called to get the "__dict__" of the object *o*. Pass "NULL" for *context* when calling it. Since this function may need to allocate memory for the dictionary, it may be more efficient to call "PyObject_GetAttr()" when accessing an attribute on the object. On failure, returns "NULL" with an exception set. Added in version 3.3. int PyObject_GenericSetDict(PyObject *o, PyObject *value, void *context) * Part of the Stable ABI since version 3.7.* A generic implementation for the setter of a "__dict__" descriptor. This implementation does not allow the dictionary to be deleted. Added in version 3.3. PyObject **_PyObject_GetDictPtr(PyObject *obj) Return a pointer to "__dict__" of the object *obj*. If there is no "__dict__", return "NULL" without setting an exception. This function may need to allocate memory for the dictionary, so it may be more efficient to call "PyObject_GetAttr()" when accessing an attribute on the object. PyObject *PyObject_RichCompare(PyObject *o1, PyObject *o2, int opid) *Return value: New reference.** Part of the Stable ABI.* Compare the values of *o1* and *o2* using the operation specified by *opid*, which must be one of "Py_LT", "Py_LE", "Py_EQ", "Py_NE", "Py_GT", or "Py_GE", corresponding to "<", "<=", "==", "!=", ">", or ">=" respectively. This is the equivalent of the Python expression "o1 op o2", where "op" is the operator corresponding to *opid*. Returns the value of the comparison on success, or "NULL" on failure. int PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid) * Part of the Stable ABI.* Compare the values of *o1* and *o2* using the operation specified by *opid*, like "PyObject_RichCompare()", but returns "-1" on error, "0" if the result is false, "1" otherwise. Note: If *o1* and *o2* are the same object, "PyObject_RichCompareBool()" will always return "1" for "Py_EQ" and "0" for "Py_NE". PyObject *PyObject_Format(PyObject *obj, PyObject *format_spec) * Part of the Stable ABI.* Format *obj* using *format_spec*. This is equivalent to the Python expression "format(obj, format_spec)". *format_spec* may be "NULL". In this case the call is equivalent to "format(obj)". Returns the formatted string on success, "NULL" on failure. PyObject *PyObject_Repr(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Compute a string representation of object *o*. Returns the string representation on success, "NULL" on failure. This is the equivalent of the Python expression "repr(o)". Called by the "repr()" built-in function. Changed in version 3.4: This function now includes a debug assertion to help ensure that it does not silently discard an active exception. PyObject *PyObject_ASCII(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* As "PyObject_Repr()", compute a string representation of object *o*, but escape the non-ASCII characters in the string returned by "PyObject_Repr()" with "\x", "\u" or "\U" escapes. This generates a string similar to that returned by "PyObject_Repr()" in Python 2. Called by the "ascii()" built-in function. PyObject *PyObject_Str(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Compute a string representation of object *o*. Returns the string representation on success, "NULL" on failure. This is the equivalent of the Python expression "str(o)". Called by the "str()" built-in function and, therefore, by the "print()" function. Changed in version 3.4: This function now includes a debug assertion to help ensure that it does not silently discard an active exception. PyObject *PyObject_Bytes(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Compute a bytes representation of object *o*. "NULL" is returned on failure and a bytes object on success. This is equivalent to the Python expression "bytes(o)", when *o* is not an integer. Unlike "bytes(o)", a TypeError is raised when *o* is an integer instead of a zero-initialized bytes object. int PyObject_IsSubclass(PyObject *derived, PyObject *cls) * Part of the Stable ABI.* Return "1" if the class *derived* is identical to or derived from the class *cls*, otherwise return "0". In case of an error, return "-1". If *cls* is a tuple, the check will be done against every entry in *cls*. The result will be "1" when at least one of the checks returns "1", otherwise it will be "0". If *cls* has a "__subclasscheck__()" method, it will be called to determine the subclass status as described in **PEP 3119**. Otherwise, *derived* is a subclass of *cls* if it is a direct or indirect subclass, i.e. contained in "cls.__mro__". Normally only class objects, i.e. instances of "type" or a derived class, are considered classes. However, objects can override this by having a "__bases__" attribute (which must be a tuple of base classes). int PyObject_IsInstance(PyObject *inst, PyObject *cls) * Part of the Stable ABI.* Return "1" if *inst* is an instance of the class *cls* or a subclass of *cls*, or "0" if not. On error, returns "-1" and sets an exception. If *cls* is a tuple, the check will be done against every entry in *cls*. The result will be "1" when at least one of the checks returns "1", otherwise it will be "0". If *cls* has a "__instancecheck__()" method, it will be called to determine the subclass status as described in **PEP 3119**. Otherwise, *inst* is an instance of *cls* if its class is a subclass of *cls*. An instance *inst* can override what is considered its class by having a "__class__" attribute. An object *cls* can override if it is considered a class, and what its base classes are, by having a "__bases__" attribute (which must be a tuple of base classes). Py_hash_t PyObject_Hash(PyObject *o) * Part of the Stable ABI.* Compute and return the hash value of an object *o*. On failure, return "-1". This is the equivalent of the Python expression "hash(o)". Changed in version 3.2: The return type is now Py_hash_t. This is a signed integer the same size as "Py_ssize_t". Py_hash_t PyObject_HashNotImplemented(PyObject *o) * Part of the Stable ABI.* Set a "TypeError" indicating that "type(o)" is not *hashable* and return "-1". This function receives special treatment when stored in a "tp_hash" slot, allowing a type to explicitly indicate to the interpreter that it is not hashable. int PyObject_IsTrue(PyObject *o) * Part of the Stable ABI.* Returns "1" if the object *o* is considered to be true, and "0" otherwise. This is equivalent to the Python expression "not not o". On failure, return "-1". int PyObject_Not(PyObject *o) * Part of the Stable ABI.* Returns "0" if the object *o* is considered to be true, and "1" otherwise. This is equivalent to the Python expression "not o". On failure, return "-1". PyObject *PyObject_Type(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* When *o* is non-"NULL", returns a type object corresponding to the object type of object *o*. On failure, raises "SystemError" and returns "NULL". This is equivalent to the Python expression "type(o)". This function creates a new *strong reference* to the return value. There’s really no reason to use this function instead of the "Py_TYPE()" function, which returns a pointer of type PyTypeObject*, except when a new *strong reference* is needed. int PyObject_TypeCheck(PyObject *o, PyTypeObject *type) Return non-zero if the object *o* is of type *type* or a subtype of *type*, and "0" otherwise. Both parameters must be non-"NULL". Py_ssize_t PyObject_Size(PyObject *o) Py_ssize_t PyObject_Length(PyObject *o) * Part of the Stable ABI.* Return the length of object *o*. If the object *o* provides either the sequence and mapping protocols, the sequence length is returned. On error, "-1" is returned. This is the equivalent to the Python expression "len(o)". Py_ssize_t PyObject_LengthHint(PyObject *o, Py_ssize_t defaultvalue) Return an estimated length for the object *o*. First try to return its actual length, then an estimate using "__length_hint__()", and finally return the default value. On error return "-1". This is the equivalent to the Python expression "operator.length_hint(o, defaultvalue)". Added in version 3.4. PyObject *PyObject_GetItem(PyObject *o, PyObject *key) *Return value: New reference.** Part of the Stable ABI.* Return element of *o* corresponding to the object *key* or "NULL" on failure. This is the equivalent of the Python expression "o[key]". int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *v) * Part of the Stable ABI.* Map the object *key* to the value *v*. Raise an exception and return "-1" on failure; return "0" on success. This is the equivalent of the Python statement "o[key] = v". This function *does not* steal a reference to *v*. int PyObject_DelItem(PyObject *o, PyObject *key) * Part of the Stable ABI.* Remove the mapping for the object *key* from the object *o*. Return "-1" on failure. This is equivalent to the Python statement "del o[key]". int PyObject_DelItemString(PyObject *o, const char *key) * Part of the Stable ABI.* This is the same as "PyObject_DelItem()", but *key* is specified as a const char* UTF-8 encoded bytes string, rather than a PyObject*. PyObject *PyObject_Dir(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* This is equivalent to the Python expression "dir(o)", returning a (possibly empty) list of strings appropriate for the object argument, or "NULL" if there was an error. If the argument is "NULL", this is like the Python "dir()", returning the names of the current locals; in this case, if no execution frame is active then "NULL" is returned but "PyErr_Occurred()" will return false. PyObject *PyObject_GetIter(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* This is equivalent to the Python expression "iter(o)". It returns a new iterator for the object argument, or the object itself if the object is already an iterator. Raises "TypeError" and returns "NULL" if the object cannot be iterated. PyObject *PyObject_SelfIter(PyObject *obj) *Return value: New reference.** Part of the Stable ABI.* This is equivalent to the Python "__iter__(self): return self" method. It is intended for *iterator* types, to be used in the "PyTypeObject.tp_iter" slot. PyObject *PyObject_GetAIter(PyObject *o) *Return value: New reference.** Part of the Stable ABI since version 3.10.* This is the equivalent to the Python expression "aiter(o)". Takes an "AsyncIterable" object and returns an "AsyncIterator" for it. This is typically a new iterator but if the argument is an "AsyncIterator", this returns itself. Raises "TypeError" and returns "NULL" if the object cannot be iterated. Added in version 3.10. void *PyObject_GetTypeData(PyObject *o, PyTypeObject *cls) * Part of the Stable ABI since version 3.12.* Get a pointer to subclass-specific data reserved for *cls*. The object *o* must be an instance of *cls*, and *cls* must have been created using negative "PyType_Spec.basicsize". Python does not check this. On error, set an exception and return "NULL". Added in version 3.12. Py_ssize_t PyType_GetTypeDataSize(PyTypeObject *cls) * Part of the Stable ABI since version 3.12.* Return the size of the instance memory space reserved for *cls*, i.e. the size of the memory "PyObject_GetTypeData()" returns. This may be larger than requested using "-PyType_Spec.basicsize"; it is safe to use this larger size (e.g. with "memset()"). The type *cls* **must** have been created using negative "PyType_Spec.basicsize". Python does not check this. On error, set an exception and return a negative value. Added in version 3.12. void *PyObject_GetItemData(PyObject *o) Get a pointer to per-item data for a class with "Py_TPFLAGS_ITEMS_AT_END". On error, set an exception and return "NULL". "TypeError" is raised if *o* does not have "Py_TPFLAGS_ITEMS_AT_END" set. Added in version 3.12. int PyObject_VisitManagedDict(PyObject *obj, visitproc visit, void *arg) Visit the managed dictionary of *obj*. This function must only be called in a traverse function of the type which has the "Py_TPFLAGS_MANAGED_DICT" flag set. Added in version 3.13. void PyObject_ClearManagedDict(PyObject *obj) Clear the managed dictionary of *obj*. This function must only be called in a traverse function of the type which has the "Py_TPFLAGS_MANAGED_DICT" flag set. Added in version 3.13. Object Implementation Support ***************************** This chapter describes the functions, types, and macros used when defining new object types. * Allocating Objects on the Heap * Common Object Structures * Base object types and macros * Implementing functions and methods * Accessing attributes of extension types * Member flags * Member types * Defining Getters and Setters * Type Object Structures * Quick Reference * “tp slots” * sub-slots * slot typedefs * PyTypeObject Definition * PyObject Slots * PyVarObject Slots * PyTypeObject Slots * Static Types * Heap Types * Number Object Structures * Mapping Object Structures * Sequence Object Structures * Buffer Object Structures * Async Object Structures * Slot Type typedefs * Examples * Supporting Cyclic Garbage Collection * Controlling the Garbage Collector State * Querying Garbage Collector State Support for Perf Maps ********************* On supported platforms (as of this writing, only Linux), the runtime can take advantage of *perf map files* to make Python functions visible to an external profiling tool (such as perf). A running process may create a file in the "/tmp" directory, which contains entries that can map a section of executable code to a name. This interface is described in the documentation of the Linux Perf tool. In Python, these helper APIs can be used by libraries and features that rely on generating machine code on the fly. Note that holding the Global Interpreter Lock (GIL) is not required for these APIs. int PyUnstable_PerfMapState_Init(void) *This is Unstable API. It may change without warning in minor releases.* Open the "/tmp/perf-$pid.map" file, unless it’s already opened, and create a lock to ensure thread-safe writes to the file (provided the writes are done through "PyUnstable_WritePerfMapEntry()"). Normally, there’s no need to call this explicitly; just use "PyUnstable_WritePerfMapEntry()" and it will initialize the state on first call. Returns "0" on success, "-1" on failure to create/open the perf map file, or "-2" on failure to create a lock. Check "errno" for more information about the cause of a failure. int PyUnstable_WritePerfMapEntry(const void *code_addr, unsigned int code_size, const char *entry_name) *This is Unstable API. It may change without warning in minor releases.* Write one single entry to the "/tmp/perf-$pid.map" file. This function is thread safe. Here is what an example entry looks like: # address size name 7f3529fcf759 b py::bar:/run/t.py Will call "PyUnstable_PerfMapState_Init()" before writing the entry, if the perf map file is not already opened. Returns "0" on success, or the same error codes as "PyUnstable_PerfMapState_Init()" on failure. void PyUnstable_PerfMapState_Fini(void) *This is Unstable API. It may change without warning in minor releases.* Close the perf map file opened by "PyUnstable_PerfMapState_Init()". This is called by the runtime itself during interpreter shut-down. In general, there shouldn’t be a reason to explicitly call this, except to handle specific scenarios such as forking. Reference Counting ****************** The functions and macros in this section are used for managing reference counts of Python objects. Py_ssize_t Py_REFCNT(PyObject *o) Get the reference count of the Python object *o*. Note that the returned value may not actually reflect how many references to the object are actually held. For example, some objects are *immortal* and have a very high refcount that does not reflect the actual number of references. Consequently, do not rely on the returned value to be accurate, other than a value of 0 or 1. Use the "Py_SET_REFCNT()" function to set an object reference count. Changed in version 3.10: "Py_REFCNT()" is changed to the inline static function. Changed in version 3.11: The parameter type is no longer const PyObject*. void Py_SET_REFCNT(PyObject *o, Py_ssize_t refcnt) Set the object *o* reference counter to *refcnt*. On Python build with Free Threading, if *refcnt* is larger than "UINT32_MAX", the object is made *immortal*. This function has no effect on *immortal* objects. Added in version 3.9. Changed in version 3.12: Immortal objects are not modified. void Py_INCREF(PyObject *o) Indicate taking a new *strong reference* to object *o*, indicating it is in use and should not be destroyed. This function has no effect on *immortal* objects. This function is usually used to convert a *borrowed reference* to a *strong reference* in-place. The "Py_NewRef()" function can be used to create a new *strong reference*. When done using the object, release is by calling "Py_DECREF()". The object must not be "NULL"; if you aren’t sure that it isn’t "NULL", use "Py_XINCREF()". Do not expect this function to actually modify *o* in any way. For at least **some objects**, this function has no effect. Changed in version 3.12: Immortal objects are not modified. void Py_XINCREF(PyObject *o) Similar to "Py_INCREF()", but the object *o* can be "NULL", in which case this has no effect. See also "Py_XNewRef()". PyObject *Py_NewRef(PyObject *o) * Part of the Stable ABI since version 3.10.* Create a new *strong reference* to an object: call "Py_INCREF()" on *o* and return the object *o*. When the *strong reference* is no longer needed, "Py_DECREF()" should be called on it to release the reference. The object *o* must not be "NULL"; use "Py_XNewRef()" if *o* can be "NULL". For example: Py_INCREF(obj); self->attr = obj; can be written as: self->attr = Py_NewRef(obj); See also "Py_INCREF()". Added in version 3.10. PyObject *Py_XNewRef(PyObject *o) * Part of the Stable ABI since version 3.10.* Similar to "Py_NewRef()", but the object *o* can be NULL. If the object *o* is "NULL", the function just returns "NULL". Added in version 3.10. void Py_DECREF(PyObject *o) Release a *strong reference* to object *o*, indicating the reference is no longer used. This function has no effect on *immortal* objects. Once the last *strong reference* is released (i.e. the object’s reference count reaches 0), the object’s type’s deallocation function (which must not be "NULL") is invoked. This function is usually used to delete a *strong reference* before exiting its scope. The object must not be "NULL"; if you aren’t sure that it isn’t "NULL", use "Py_XDECREF()". Do not expect this function to actually modify *o* in any way. For at least **some objects**, this function has no effect. Warning: The deallocation function can cause arbitrary Python code to be invoked (e.g. when a class instance with a "__del__()" method is deallocated). While exceptions in such code are not propagated, the executed code has free access to all Python global variables. This means that any object that is reachable from a global variable should be in a consistent state before "Py_DECREF()" is invoked. For example, code to delete an object from a list should copy a reference to the deleted object in a temporary variable, update the list data structure, and then call "Py_DECREF()" for the temporary variable. Changed in version 3.12: Immortal objects are not modified. void Py_XDECREF(PyObject *o) Similar to "Py_DECREF()", but the object *o* can be "NULL", in which case this has no effect. The same warning from "Py_DECREF()" applies here as well. void Py_CLEAR(PyObject *o) Release a *strong reference* for object *o*. The object may be "NULL", in which case the macro has no effect; otherwise the effect is the same as for "Py_DECREF()", except that the argument is also set to "NULL". The warning for "Py_DECREF()" does not apply with respect to the object passed because the macro carefully uses a temporary variable and sets the argument to "NULL" before releasing the reference. It is a good idea to use this macro whenever releasing a reference to an object that might be traversed during garbage collection. Changed in version 3.12: The macro argument is now only evaluated once. If the argument has side effects, these are no longer duplicated. void Py_IncRef(PyObject *o) * Part of the Stable ABI.* Indicate taking a new *strong reference* to object *o*. A function version of "Py_XINCREF()". It can be used for runtime dynamic embedding of Python. void Py_DecRef(PyObject *o) * Part of the Stable ABI.* Release a *strong reference* to object *o*. A function version of "Py_XDECREF()". It can be used for runtime dynamic embedding of Python. Py_SETREF(dst, src) Macro safely releasing a *strong reference* to object *dst* and setting *dst* to *src*. As in case of "Py_CLEAR()", “the obvious” code can be deadly: Py_DECREF(dst); dst = src; The safe way is: Py_SETREF(dst, src); That arranges to set *dst* to *src* _before_ releasing the reference to the old value of *dst*, so that any code triggered as a side-effect of *dst* getting torn down no longer believes *dst* points to a valid object. Added in version 3.6. Changed in version 3.12: The macro arguments are now only evaluated once. If an argument has side effects, these are no longer duplicated. Py_XSETREF(dst, src) Variant of "Py_SETREF" macro that uses "Py_XDECREF()" instead of "Py_DECREF()". Added in version 3.6. Changed in version 3.12: The macro arguments are now only evaluated once. If an argument has side effects, these are no longer duplicated. Reflection ********** PyObject *PyEval_GetBuiltins(void) *Return value: Borrowed reference.** Part of the Stable ABI.* Deprecated since version 3.13: Use "PyEval_GetFrameBuiltins()" instead. Return a dictionary of the builtins in the current execution frame, or the interpreter of the thread state if no frame is currently executing. PyObject *PyEval_GetLocals(void) *Return value: Borrowed reference.** Part of the Stable ABI.* Deprecated since version 3.13: Use either "PyEval_GetFrameLocals()" to obtain the same behaviour as calling "locals()" in Python code, or else call "PyFrame_GetLocals()" on the result of "PyEval_GetFrame()" to access the "f_locals" attribute of the currently executing frame. Return a mapping providing access to the local variables in the current execution frame, or "NULL" if no frame is currently executing. Refer to "locals()" for details of the mapping returned at different scopes. As this function returns a *borrowed reference*, the dictionary returned for *optimized scopes* is cached on the frame object and will remain alive as long as the frame object does. Unlike "PyEval_GetFrameLocals()" and "locals()", subsequent calls to this function in the same frame will update the contents of the cached dictionary to reflect changes in the state of the local variables rather than returning a new snapshot. Changed in version 3.13: As part of **PEP 667**, "PyFrame_GetLocals()", "locals()", and "FrameType.f_locals" no longer make use of the shared cache dictionary. Refer to the What’s New entry for additional details. PyObject *PyEval_GetGlobals(void) *Return value: Borrowed reference.** Part of the Stable ABI.* Deprecated since version 3.13: Use "PyEval_GetFrameGlobals()" instead. Return a dictionary of the global variables in the current execution frame, or "NULL" if no frame is currently executing. PyFrameObject *PyEval_GetFrame(void) *Return value: Borrowed reference.** Part of the Stable ABI.* Return the current thread state’s frame, which is "NULL" if no frame is currently executing. See also "PyThreadState_GetFrame()". PyObject *PyEval_GetFrameBuiltins(void) *Return value: New reference.** Part of the Stable ABI since version 3.13.* Return a dictionary of the builtins in the current execution frame, or the interpreter of the thread state if no frame is currently executing. Added in version 3.13. PyObject *PyEval_GetFrameLocals(void) *Return value: New reference.** Part of the Stable ABI since version 3.13.* Return a dictionary of the local variables in the current execution frame, or "NULL" if no frame is currently executing. Equivalent to calling "locals()" in Python code. To access "f_locals" on the current frame without making an independent snapshot in *optimized scopes*, call "PyFrame_GetLocals()" on the result of "PyEval_GetFrame()". Added in version 3.13. PyObject *PyEval_GetFrameGlobals(void) *Return value: New reference.** Part of the Stable ABI since version 3.13.* Return a dictionary of the global variables in the current execution frame, or "NULL" if no frame is currently executing. Equivalent to calling "globals()" in Python code. Added in version 3.13. const char *PyEval_GetFuncName(PyObject *func) * Part of the Stable ABI.* Return the name of *func* if it is a function, class or instance object, else the name of *func*s type. const char *PyEval_GetFuncDesc(PyObject *func) * Part of the Stable ABI.* Return a description string, depending on the type of *func*. Return values include “()” for functions and methods, “ constructor”, “ instance”, and “ object”. Concatenated with the result of "PyEval_GetFuncName()", the result will be a description of *func*. Sequence Protocol ***************** int PySequence_Check(PyObject *o) * Part of the Stable ABI.* Return "1" if the object provides the sequence protocol, and "0" otherwise. Note that it returns "1" for Python classes with a "__getitem__()" method, unless they are "dict" subclasses, since in general it is impossible to determine what type of keys the class supports. This function always succeeds. Py_ssize_t PySequence_Size(PyObject *o) Py_ssize_t PySequence_Length(PyObject *o) * Part of the Stable ABI.* Returns the number of objects in sequence *o* on success, and "-1" on failure. This is equivalent to the Python expression "len(o)". PyObject *PySequence_Concat(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Return the concatenation of *o1* and *o2* on success, and "NULL" on failure. This is the equivalent of the Python expression "o1 + o2". PyObject *PySequence_Repeat(PyObject *o, Py_ssize_t count) *Return value: New reference.** Part of the Stable ABI.* Return the result of repeating sequence object *o* *count* times, or "NULL" on failure. This is the equivalent of the Python expression "o * count". PyObject *PySequence_InPlaceConcat(PyObject *o1, PyObject *o2) *Return value: New reference.** Part of the Stable ABI.* Return the concatenation of *o1* and *o2* on success, and "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python expression "o1 += o2". PyObject *PySequence_InPlaceRepeat(PyObject *o, Py_ssize_t count) *Return value: New reference.** Part of the Stable ABI.* Return the result of repeating sequence object *o* *count* times, or "NULL" on failure. The operation is done *in-place* when *o* supports it. This is the equivalent of the Python expression "o *= count". PyObject *PySequence_GetItem(PyObject *o, Py_ssize_t i) *Return value: New reference.** Part of the Stable ABI.* Return the *i*th element of *o*, or "NULL" on failure. This is the equivalent of the Python expression "o[i]". PyObject *PySequence_GetSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2) *Return value: New reference.** Part of the Stable ABI.* Return the slice of sequence object *o* between *i1* and *i2*, or "NULL" on failure. This is the equivalent of the Python expression "o[i1:i2]". int PySequence_SetItem(PyObject *o, Py_ssize_t i, PyObject *v) * Part of the Stable ABI.* Assign object *v* to the *i*th element of *o*. Raise an exception and return "-1" on failure; return "0" on success. This is the equivalent of the Python statement "o[i] = v". This function *does not* steal a reference to *v*. If *v* is "NULL", the element is deleted, but this feature is deprecated in favour of using "PySequence_DelItem()". int PySequence_DelItem(PyObject *o, Py_ssize_t i) * Part of the Stable ABI.* Delete the *i*th element of object *o*. Returns "-1" on failure. This is the equivalent of the Python statement "del o[i]". int PySequence_SetSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2, PyObject *v) * Part of the Stable ABI.* Assign the sequence object *v* to the slice in sequence object *o* from *i1* to *i2*. This is the equivalent of the Python statement "o[i1:i2] = v". int PySequence_DelSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2) * Part of the Stable ABI.* Delete the slice in sequence object *o* from *i1* to *i2*. Returns "-1" on failure. This is the equivalent of the Python statement "del o[i1:i2]". Py_ssize_t PySequence_Count(PyObject *o, PyObject *value) * Part of the Stable ABI.* Return the number of occurrences of *value* in *o*, that is, return the number of keys for which "o[key] == value". On failure, return "-1". This is equivalent to the Python expression "o.count(value)". int PySequence_Contains(PyObject *o, PyObject *value) * Part of the Stable ABI.* Determine if *o* contains *value*. If an item in *o* is equal to *value*, return "1", otherwise return "0". On error, return "-1". This is equivalent to the Python expression "value in o". Py_ssize_t PySequence_Index(PyObject *o, PyObject *value) * Part of the Stable ABI.* Return the first index *i* for which "o[i] == value". On error, return "-1". This is equivalent to the Python expression "o.index(value)". PyObject *PySequence_List(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Return a list object with the same contents as the sequence or iterable *o*, or "NULL" on failure. The returned list is guaranteed to be new. This is equivalent to the Python expression "list(o)". PyObject *PySequence_Tuple(PyObject *o) *Return value: New reference.** Part of the Stable ABI.* Return a tuple object with the same contents as the sequence or iterable *o*, or "NULL" on failure. If *o* is a tuple, a new reference will be returned, otherwise a tuple will be constructed with the appropriate contents. This is equivalent to the Python expression "tuple(o)". PyObject *PySequence_Fast(PyObject *o, const char *m) *Return value: New reference.** Part of the Stable ABI.* Return the sequence or iterable *o* as an object usable by the other "PySequence_Fast*" family of functions. If the object is not a sequence or iterable, raises "TypeError" with *m* as the message text. Returns "NULL" on failure. The "PySequence_Fast*" functions are thus named because they assume *o* is a "PyTupleObject" or a "PyListObject" and access the data fields of *o* directly. As a CPython implementation detail, if *o* is already a sequence or list, it will be returned. Py_ssize_t PySequence_Fast_GET_SIZE(PyObject *o) Returns the length of *o*, assuming that *o* was returned by "PySequence_Fast()" and that *o* is not "NULL". The size can also be retrieved by calling "PySequence_Size()" on *o*, but "PySequence_Fast_GET_SIZE()" is faster because it can assume *o* is a list or tuple. PyObject *PySequence_Fast_GET_ITEM(PyObject *o, Py_ssize_t i) *Return value: Borrowed reference.* Return the *i*th element of *o*, assuming that *o* was returned by "PySequence_Fast()", *o* is not "NULL", and that *i* is within bounds. PyObject **PySequence_Fast_ITEMS(PyObject *o) Return the underlying array of PyObject pointers. Assumes that *o* was returned by "PySequence_Fast()" and *o* is not "NULL". Note, if a list gets resized, the reallocation may relocate the items array. So, only use the underlying array pointer in contexts where the sequence cannot change. PyObject *PySequence_ITEM(PyObject *o, Py_ssize_t i) *Return value: New reference.* Return the *i*th element of *o* or "NULL" on failure. Faster form of "PySequence_GetItem()" but without checking that "PySequence_Check()" on *o* is true and without adjustment for negative indices. Set Objects *********** This section details the public API for "set" and "frozenset" objects. Any functionality not listed below is best accessed using either the abstract object protocol (including "PyObject_CallMethod()", "PyObject_RichCompareBool()", "PyObject_Hash()", "PyObject_Repr()", "PyObject_IsTrue()", "PyObject_Print()", and "PyObject_GetIter()") or the abstract number protocol (including "PyNumber_And()", "PyNumber_Subtract()", "PyNumber_Or()", "PyNumber_Xor()", "PyNumber_InPlaceAnd()", "PyNumber_InPlaceSubtract()", "PyNumber_InPlaceOr()", and "PyNumber_InPlaceXor()"). type PySetObject This subtype of "PyObject" is used to hold the internal data for both "set" and "frozenset" objects. It is like a "PyDictObject" in that it is a fixed size for small sets (much like tuple storage) and will point to a separate, variable sized block of memory for medium and large sized sets (much like list storage). None of the fields of this structure should be considered public and all are subject to change. All access should be done through the documented API rather than by manipulating the values in the structure. PyTypeObject PySet_Type * Part of the Stable ABI.* This is an instance of "PyTypeObject" representing the Python "set" type. PyTypeObject PyFrozenSet_Type * Part of the Stable ABI.* This is an instance of "PyTypeObject" representing the Python "frozenset" type. The following type check macros work on pointers to any Python object. Likewise, the constructor functions work with any iterable Python object. int PySet_Check(PyObject *p) Return true if *p* is a "set" object or an instance of a subtype. This function always succeeds. int PyFrozenSet_Check(PyObject *p) Return true if *p* is a "frozenset" object or an instance of a subtype. This function always succeeds. int PyAnySet_Check(PyObject *p) Return true if *p* is a "set" object, a "frozenset" object, or an instance of a subtype. This function always succeeds. int PySet_CheckExact(PyObject *p) Return true if *p* is a "set" object but not an instance of a subtype. This function always succeeds. Added in version 3.10. int PyAnySet_CheckExact(PyObject *p) Return true if *p* is a "set" object or a "frozenset" object but not an instance of a subtype. This function always succeeds. int PyFrozenSet_CheckExact(PyObject *p) Return true if *p* is a "frozenset" object but not an instance of a subtype. This function always succeeds. PyObject *PySet_New(PyObject *iterable) *Return value: New reference.** Part of the Stable ABI.* Return a new "set" containing objects returned by the *iterable*. The *iterable* may be "NULL" to create a new empty set. Return the new set on success or "NULL" on failure. Raise "TypeError" if *iterable* is not actually iterable. The constructor is also useful for copying a set ("c=set(s)"). PyObject *PyFrozenSet_New(PyObject *iterable) *Return value: New reference.** Part of the Stable ABI.* Return a new "frozenset" containing objects returned by the *iterable*. The *iterable* may be "NULL" to create a new empty frozenset. Return the new set on success or "NULL" on failure. Raise "TypeError" if *iterable* is not actually iterable. The following functions and macros are available for instances of "set" or "frozenset" or instances of their subtypes. Py_ssize_t PySet_Size(PyObject *anyset) * Part of the Stable ABI.* Return the length of a "set" or "frozenset" object. Equivalent to "len(anyset)". Raises a "SystemError" if *anyset* is not a "set", "frozenset", or an instance of a subtype. Py_ssize_t PySet_GET_SIZE(PyObject *anyset) Macro form of "PySet_Size()" without error checking. int PySet_Contains(PyObject *anyset, PyObject *key) * Part of the Stable ABI.* Return "1" if found, "0" if not found, and "-1" if an error is encountered. Unlike the Python "__contains__()" method, this function does not automatically convert unhashable sets into temporary frozensets. Raise a "TypeError" if the *key* is unhashable. Raise "SystemError" if *anyset* is not a "set", "frozenset", or an instance of a subtype. int PySet_Add(PyObject *set, PyObject *key) * Part of the Stable ABI.* Add *key* to a "set" instance. Also works with "frozenset" instances (like "PyTuple_SetItem()" it can be used to fill in the values of brand new frozensets before they are exposed to other code). Return "0" on success or "-1" on failure. Raise a "TypeError" if the *key* is unhashable. Raise a "MemoryError" if there is no room to grow. Raise a "SystemError" if *set* is not an instance of "set" or its subtype. The following functions are available for instances of "set" or its subtypes but not for instances of "frozenset" or its subtypes. int PySet_Discard(PyObject *set, PyObject *key) * Part of the Stable ABI.* Return "1" if found and removed, "0" if not found (no action taken), and "-1" if an error is encountered. Does not raise "KeyError" for missing keys. Raise a "TypeError" if the *key* is unhashable. Unlike the Python "discard()" method, this function does not automatically convert unhashable sets into temporary frozensets. Raise "SystemError" if *set* is not an instance of "set" or its subtype. PyObject *PySet_Pop(PyObject *set) *Return value: New reference.** Part of the Stable ABI.* Return a new reference to an arbitrary object in the *set*, and removes the object from the *set*. Return "NULL" on failure. Raise "KeyError" if the set is empty. Raise a "SystemError" if *set* is not an instance of "set" or its subtype. int PySet_Clear(PyObject *set) * Part of the Stable ABI.* Empty an existing set of all elements. Return "0" on success. Return "-1" and raise "SystemError" if *set* is not an instance of "set" or its subtype. Slice Objects ************* PyTypeObject PySlice_Type * Part of the Stable ABI.* The type object for slice objects. This is the same as "slice" in the Python layer. int PySlice_Check(PyObject *ob) Return true if *ob* is a slice object; *ob* must not be "NULL". This function always succeeds. PyObject *PySlice_New(PyObject *start, PyObject *stop, PyObject *step) *Return value: New reference.** Part of the Stable ABI.* Return a new slice object with the given values. The *start*, *stop*, and *step* parameters are used as the values of the slice object attributes of the same names. Any of the values may be "NULL", in which case the "None" will be used for the corresponding attribute. Return "NULL" with an exception set if the new object could not be allocated. int PySlice_GetIndices(PyObject *slice, Py_ssize_t length, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t *step) * Part of the Stable ABI.* Retrieve the start, stop and step indices from the slice object *slice*, assuming a sequence of length *length*. Treats indices greater than *length* as errors. Returns "0" on success and "-1" on error with no exception set (unless one of the indices was not "None" and failed to be converted to an integer, in which case "-1" is returned with an exception set). You probably do not want to use this function. Changed in version 3.2: The parameter type for the *slice* parameter was "PySliceObject*" before. int PySlice_GetIndicesEx(PyObject *slice, Py_ssize_t length, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t *step, Py_ssize_t *slicelength) * Part of the Stable ABI.* Usable replacement for "PySlice_GetIndices()". Retrieve the start, stop, and step indices from the slice object *slice* assuming a sequence of length *length*, and store the length of the slice in *slicelength*. Out of bounds indices are clipped in a manner consistent with the handling of normal slices. Return "0" on success and "-1" on error with an exception set. Note: This function is considered not safe for resizable sequences. Its invocation should be replaced by a combination of "PySlice_Unpack()" and "PySlice_AdjustIndices()" where if (PySlice_GetIndicesEx(slice, length, &start, &stop, &step, &slicelength) < 0) { // return error } is replaced by if (PySlice_Unpack(slice, &start, &stop, &step) < 0) { // return error } slicelength = PySlice_AdjustIndices(length, &start, &stop, step); Changed in version 3.2: The parameter type for the *slice* parameter was "PySliceObject*" before. Changed in version 3.6.1: If "Py_LIMITED_API" is not set or set to the value between "0x03050400" and "0x03060000" (not including) or "0x03060100" or higher "PySlice_GetIndicesEx()" is implemented as a macro using "PySlice_Unpack()" and "PySlice_AdjustIndices()". Arguments *start*, *stop* and *step* are evaluated more than once. Deprecated since version 3.6.1: If "Py_LIMITED_API" is set to the value less than "0x03050400" or between "0x03060000" and "0x03060100" (not including) "PySlice_GetIndicesEx()" is a deprecated function. int PySlice_Unpack(PyObject *slice, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t *step) * Part of the Stable ABI since version 3.7.* Extract the start, stop and step data members from a slice object as C integers. Silently reduce values larger than "PY_SSIZE_T_MAX" to "PY_SSIZE_T_MAX", silently boost the start and stop values less than "PY_SSIZE_T_MIN" to "PY_SSIZE_T_MIN", and silently boost the step values less than "-PY_SSIZE_T_MAX" to "-PY_SSIZE_T_MAX". Return "-1" with an exception set on error, "0" on success. Added in version 3.6.1. Py_ssize_t PySlice_AdjustIndices(Py_ssize_t length, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t step) * Part of the Stable ABI since version 3.7.* Adjust start/end slice indices assuming a sequence of the specified length. Out of bounds indices are clipped in a manner consistent with the handling of normal slices. Return the length of the slice. Always successful. Doesn’t call Python code. Added in version 3.6.1. Ellipsis Object =============== PyTypeObject PyEllipsis_Type * Part of the Stable ABI.* The type of Python "Ellipsis" object. Same as "types.EllipsisType" in the Python layer. PyObject *Py_Ellipsis The Python "Ellipsis" object. This object has no methods. Like "Py_None", it is an *immortal* singleton object. Changed in version 3.12: "Py_Ellipsis" is immortal. C API Stability *************** Unless documented otherwise, Python’s C API is covered by the Backwards Compatibility Policy, **PEP 387**. Most changes to it are source-compatible (typically by only adding new API). Changing existing API or removing API is only done after a deprecation period or to fix serious issues. CPython’s Application Binary Interface (ABI) is forward- and backwards-compatible across a minor release (if these are compiled the same way; see Platform Considerations below). So, code compiled for Python 3.10.0 will work on 3.10.8 and vice versa, but will need to be compiled separately for 3.9.x and 3.11.x. There are two tiers of C API with different stability expectations: * Unstable API, may change in minor versions without a deprecation period. It is marked by the "PyUnstable" prefix in names. * Limited API, is compatible across several minor releases. When "Py_LIMITED_API" is defined, only this subset is exposed from "Python.h". These are discussed in more detail below. Names prefixed by an underscore, such as "_Py_InternalState", are private API that can change without notice even in patch releases. If you need to use this API, consider reaching out to CPython developers to discuss adding public API for your use case. Unstable C API ============== Any API named with the "PyUnstable" prefix exposes CPython implementation details, and may change in every minor release (e.g. from 3.9 to 3.10) without any deprecation warnings. However, it will not change in a bugfix release (e.g. from 3.10.0 to 3.10.1). It is generally intended for specialized, low-level tools like debuggers. Projects that use this API are expected to follow CPython development and spend extra effort adjusting to changes. Stable Application Binary Interface =================================== For simplicity, this document talks about *extensions*, but the Limited API and Stable ABI work the same way for all uses of the API – for example, embedding Python. Limited C API ------------- Python 3.2 introduced the *Limited API*, a subset of Python’s C API. Extensions that only use the Limited API can be compiled once and be loaded on multiple versions of Python. Contents of the Limited API are listed below. Py_LIMITED_API Define this macro before including "Python.h" to opt in to only use the Limited API, and to select the Limited API version. Define "Py_LIMITED_API" to the value of "PY_VERSION_HEX" corresponding to the lowest Python version your extension supports. The extension will be ABI-compatible with all Python 3 releases from the specified one onward, and can use Limited API introduced up to that version. Rather than using the "PY_VERSION_HEX" macro directly, hardcode a minimum minor version (e.g. "0x030A0000" for Python 3.10) for stability when compiling with future Python versions. You can also define "Py_LIMITED_API" to "3". This works the same as "0x03020000" (Python 3.2, the version that introduced Limited API). Stable ABI ---------- To enable this, Python provides a *Stable ABI*: a set of symbols that will remain ABI-compatible across Python 3.x versions. Note: The Stable ABI prevents ABI issues, like linker errors due to missing symbols or data corruption due to changes in structure layouts or function signatures. However, other changes in Python can change the *behavior* of extensions. See Python’s Backwards Compatibility Policy (**PEP 387**) for details. The Stable ABI contains symbols exposed in the Limited API, but also other ones – for example, functions necessary to support older versions of the Limited API. On Windows, extensions that use the Stable ABI should be linked against "python3.dll" rather than a version-specific library such as "python39.dll". On some platforms, Python will look for and load shared library files named with the "abi3" tag (e.g. "mymodule.abi3.so"). It does not check if such extensions conform to a Stable ABI. The user (or their packaging tools) need to ensure that, for example, extensions built with the 3.10+ Limited API are not installed for lower versions of Python. All functions in the Stable ABI are present as functions in Python’s shared library, not solely as macros. This makes them usable from languages that don’t use the C preprocessor. Limited API Scope and Performance --------------------------------- The goal for the Limited API is to allow everything that is possible with the full C API, but possibly with a performance penalty. For example, while "PyList_GetItem()" is available, its “unsafe” macro variant "PyList_GET_ITEM()" is not. The macro can be faster because it can rely on version-specific implementation details of the list object. Without "Py_LIMITED_API" defined, some C API functions are inlined or replaced by macros. Defining "Py_LIMITED_API" disables this inlining, allowing stability as Python’s data structures are improved, but possibly reducing performance. By leaving out the "Py_LIMITED_API" definition, it is possible to compile a Limited API extension with a version-specific ABI. This can improve performance for that Python version, but will limit compatibility. Compiling with "Py_LIMITED_API" will then yield an extension that can be distributed where a version-specific one is not available – for example, for prereleases of an upcoming Python version. Limited API Caveats ------------------- Note that compiling with "Py_LIMITED_API" is *not* a complete guarantee that code conforms to the Limited API or the Stable ABI. "Py_LIMITED_API" only covers definitions, but an API also includes other issues, such as expected semantics. One issue that "Py_LIMITED_API" does not guard against is calling a function with arguments that are invalid in a lower Python version. For example, consider a function that starts accepting "NULL" for an argument. In Python 3.9, "NULL" now selects a default behavior, but in Python 3.8, the argument will be used directly, causing a "NULL" dereference and crash. A similar argument works for fields of structs. Another issue is that some struct fields are currently not hidden when "Py_LIMITED_API" is defined, even though they’re part of the Limited API. For these reasons, we recommend testing an extension with *all* minor Python versions it supports, and preferably to build with the *lowest* such version. We also recommend reviewing documentation of all used API to check if it is explicitly part of the Limited API. Even with "Py_LIMITED_API" defined, a few private declarations are exposed for technical reasons (or even unintentionally, as bugs). Also note that the Limited API is not necessarily stable: compiling with "Py_LIMITED_API" with Python 3.8 means that the extension will run with Python 3.12, but it will not necessarily *compile* with Python 3.12. In particular, parts of the Limited API may be deprecated and removed, provided that the Stable ABI stays stable. Platform Considerations ======================= ABI stability depends not only on Python, but also on the compiler used, lower-level libraries and compiler options. For the purposes of the Stable ABI, these details define a “platform”. They usually depend on the OS type and processor architecture It is the responsibility of each particular distributor of Python to ensure that all Python versions on a particular platform are built in a way that does not break the Stable ABI. This is the case with Windows and macOS releases from "python.org" and many third-party distributors. Contents of Limited API ======================= Currently, the Limited API includes the following items: * "PY_VECTORCALL_ARGUMENTS_OFFSET" * "PyAIter_Check()" * "PyArg_Parse()" * "PyArg_ParseTuple()" * "PyArg_ParseTupleAndKeywords()" * "PyArg_UnpackTuple()" * "PyArg_VaParse()" * "PyArg_VaParseTupleAndKeywords()" * "PyArg_ValidateKeywordArguments()" * "PyBaseObject_Type" * "PyBool_FromLong()" * "PyBool_Type" * "PyBuffer_FillContiguousStrides()" * "PyBuffer_FillInfo()" * "PyBuffer_FromContiguous()" * "PyBuffer_GetPointer()" * "PyBuffer_IsContiguous()" * "PyBuffer_Release()" * "PyBuffer_SizeFromFormat()" * "PyBuffer_ToContiguous()" * "PyByteArrayIter_Type" * "PyByteArray_AsString()" * "PyByteArray_Concat()" * "PyByteArray_FromObject()" * "PyByteArray_FromStringAndSize()" * "PyByteArray_Resize()" * "PyByteArray_Size()" * "PyByteArray_Type" * "PyBytesIter_Type" * "PyBytes_AsString()" * "PyBytes_AsStringAndSize()" * "PyBytes_Concat()" * "PyBytes_ConcatAndDel()" * "PyBytes_DecodeEscape()" * "PyBytes_FromFormat()" * "PyBytes_FromFormatV()" * "PyBytes_FromObject()" * "PyBytes_FromString()" * "PyBytes_FromStringAndSize()" * "PyBytes_Repr()" * "PyBytes_Size()" * "PyBytes_Type" * "PyCFunction" * "PyCFunctionFast" * "PyCFunctionFastWithKeywords" * "PyCFunctionWithKeywords" * "PyCFunction_GetFlags()" * "PyCFunction_GetFunction()" * "PyCFunction_GetSelf()" * "PyCFunction_New()" * "PyCFunction_NewEx()" * "PyCFunction_Type" * "PyCMethod_New()" * "PyCallIter_New()" * "PyCallIter_Type" * "PyCallable_Check()" * "PyCapsule_Destructor" * "PyCapsule_GetContext()" * "PyCapsule_GetDestructor()" * "PyCapsule_GetName()" * "PyCapsule_GetPointer()" * "PyCapsule_Import()" * "PyCapsule_IsValid()" * "PyCapsule_New()" * "PyCapsule_SetContext()" * "PyCapsule_SetDestructor()" * "PyCapsule_SetName()" * "PyCapsule_SetPointer()" * "PyCapsule_Type" * "PyClassMethodDescr_Type" * "PyCodec_BackslashReplaceErrors()" * "PyCodec_Decode()" * "PyCodec_Decoder()" * "PyCodec_Encode()" * "PyCodec_Encoder()" * "PyCodec_IgnoreErrors()" * "PyCodec_IncrementalDecoder()" * "PyCodec_IncrementalEncoder()" * "PyCodec_KnownEncoding()" * "PyCodec_LookupError()" * "PyCodec_NameReplaceErrors()" * "PyCodec_Register()" * "PyCodec_RegisterError()" * "PyCodec_ReplaceErrors()" * "PyCodec_StreamReader()" * "PyCodec_StreamWriter()" * "PyCodec_StrictErrors()" * "PyCodec_Unregister()" * "PyCodec_XMLCharRefReplaceErrors()" * "PyComplex_FromDoubles()" * "PyComplex_ImagAsDouble()" * "PyComplex_RealAsDouble()" * "PyComplex_Type" * "PyDescr_NewClassMethod()" * "PyDescr_NewGetSet()" * "PyDescr_NewMember()" * "PyDescr_NewMethod()" * "PyDictItems_Type" * "PyDictIterItem_Type" * "PyDictIterKey_Type" * "PyDictIterValue_Type" * "PyDictKeys_Type" * "PyDictProxy_New()" * "PyDictProxy_Type" * "PyDictRevIterItem_Type" * "PyDictRevIterKey_Type" * "PyDictRevIterValue_Type" * "PyDictValues_Type" * "PyDict_Clear()" * "PyDict_Contains()" * "PyDict_Copy()" * "PyDict_DelItem()" * "PyDict_DelItemString()" * "PyDict_GetItem()" * "PyDict_GetItemRef()" * "PyDict_GetItemString()" * "PyDict_GetItemStringRef()" * "PyDict_GetItemWithError()" * "PyDict_Items()" * "PyDict_Keys()" * "PyDict_Merge()" * "PyDict_MergeFromSeq2()" * "PyDict_New()" * "PyDict_Next()" * "PyDict_SetItem()" * "PyDict_SetItemString()" * "PyDict_Size()" * "PyDict_Type" * "PyDict_Update()" * "PyDict_Values()" * "PyEllipsis_Type" * "PyEnum_Type" * "PyErr_BadArgument()" * "PyErr_BadInternalCall()" * "PyErr_CheckSignals()" * "PyErr_Clear()" * "PyErr_Display()" * "PyErr_DisplayException()" * "PyErr_ExceptionMatches()" * "PyErr_Fetch()" * "PyErr_Format()" * "PyErr_FormatV()" * "PyErr_GetExcInfo()" * "PyErr_GetHandledException()" * "PyErr_GetRaisedException()" * "PyErr_GivenExceptionMatches()" * "PyErr_NewException()" * "PyErr_NewExceptionWithDoc()" * "PyErr_NoMemory()" * "PyErr_NormalizeException()" * "PyErr_Occurred()" * "PyErr_Print()" * "PyErr_PrintEx()" * "PyErr_ProgramText()" * "PyErr_ResourceWarning()" * "PyErr_Restore()" * "PyErr_SetExcFromWindowsErr()" * "PyErr_SetExcFromWindowsErrWithFilename()" * "PyErr_SetExcFromWindowsErrWithFilenameObject()" * "PyErr_SetExcFromWindowsErrWithFilenameObjects()" * "PyErr_SetExcInfo()" * "PyErr_SetFromErrno()" * "PyErr_SetFromErrnoWithFilename()" * "PyErr_SetFromErrnoWithFilenameObject()" * "PyErr_SetFromErrnoWithFilenameObjects()" * "PyErr_SetFromWindowsErr()" * "PyErr_SetFromWindowsErrWithFilename()" * "PyErr_SetHandledException()" * "PyErr_SetImportError()" * "PyErr_SetImportErrorSubclass()" * "PyErr_SetInterrupt()" * "PyErr_SetInterruptEx()" * "PyErr_SetNone()" * "PyErr_SetObject()" * "PyErr_SetRaisedException()" * "PyErr_SetString()" * "PyErr_SyntaxLocation()" * "PyErr_SyntaxLocationEx()" * "PyErr_WarnEx()" * "PyErr_WarnExplicit()" * "PyErr_WarnFormat()" * "PyErr_WriteUnraisable()" * "PyEval_AcquireThread()" * "PyEval_EvalCode()" * "PyEval_EvalCodeEx()" * "PyEval_EvalFrame()" * "PyEval_EvalFrameEx()" * "PyEval_GetBuiltins()" * "PyEval_GetFrame()" * "PyEval_GetFrameBuiltins()" * "PyEval_GetFrameGlobals()" * "PyEval_GetFrameLocals()" * "PyEval_GetFuncDesc()" * "PyEval_GetFuncName()" * "PyEval_GetGlobals()" * "PyEval_GetLocals()" * "PyEval_InitThreads()" * "PyEval_ReleaseThread()" * "PyEval_RestoreThread()" * "PyEval_SaveThread()" * "PyExc_ArithmeticError" * "PyExc_AssertionError" * "PyExc_AttributeError" * "PyExc_BaseException" * "PyExc_BaseExceptionGroup" * "PyExc_BlockingIOError" * "PyExc_BrokenPipeError" * "PyExc_BufferError" * "PyExc_BytesWarning" * "PyExc_ChildProcessError" * "PyExc_ConnectionAbortedError" * "PyExc_ConnectionError" * "PyExc_ConnectionRefusedError" * "PyExc_ConnectionResetError" * "PyExc_DeprecationWarning" * "PyExc_EOFError" * "PyExc_EncodingWarning" * "PyExc_EnvironmentError" * "PyExc_Exception" * "PyExc_FileExistsError" * "PyExc_FileNotFoundError" * "PyExc_FloatingPointError" * "PyExc_FutureWarning" * "PyExc_GeneratorExit" * "PyExc_IOError" * "PyExc_ImportError" * "PyExc_ImportWarning" * "PyExc_IndentationError" * "PyExc_IndexError" * "PyExc_InterruptedError" * "PyExc_IsADirectoryError" * "PyExc_KeyError" * "PyExc_KeyboardInterrupt" * "PyExc_LookupError" * "PyExc_MemoryError" * "PyExc_ModuleNotFoundError" * "PyExc_NameError" * "PyExc_NotADirectoryError" * "PyExc_NotImplementedError" * "PyExc_OSError" * "PyExc_OverflowError" * "PyExc_PendingDeprecationWarning" * "PyExc_PermissionError" * "PyExc_ProcessLookupError" * "PyExc_RecursionError" * "PyExc_ReferenceError" * "PyExc_ResourceWarning" * "PyExc_RuntimeError" * "PyExc_RuntimeWarning" * "PyExc_StopAsyncIteration" * "PyExc_StopIteration" * "PyExc_SyntaxError" * "PyExc_SyntaxWarning" * "PyExc_SystemError" * "PyExc_SystemExit" * "PyExc_TabError" * "PyExc_TimeoutError" * "PyExc_TypeError" * "PyExc_UnboundLocalError" * "PyExc_UnicodeDecodeError" * "PyExc_UnicodeEncodeError" * "PyExc_UnicodeError" * "PyExc_UnicodeTranslateError" * "PyExc_UnicodeWarning" * "PyExc_UserWarning" * "PyExc_ValueError" * "PyExc_Warning" * "PyExc_WindowsError" * "PyExc_ZeroDivisionError" * "PyExceptionClass_Name()" * "PyException_GetArgs()" * "PyException_GetCause()" * "PyException_GetContext()" * "PyException_GetTraceback()" * "PyException_SetArgs()" * "PyException_SetCause()" * "PyException_SetContext()" * "PyException_SetTraceback()" * "PyFile_FromFd()" * "PyFile_GetLine()" * "PyFile_WriteObject()" * "PyFile_WriteString()" * "PyFilter_Type" * "PyFloat_AsDouble()" * "PyFloat_FromDouble()" * "PyFloat_FromString()" * "PyFloat_GetInfo()" * "PyFloat_GetMax()" * "PyFloat_GetMin()" * "PyFloat_Type" * "PyFrameObject" * "PyFrame_GetCode()" * "PyFrame_GetLineNumber()" * "PyFrozenSet_New()" * "PyFrozenSet_Type" * "PyGC_Collect()" * "PyGC_Disable()" * "PyGC_Enable()" * "PyGC_IsEnabled()" * "PyGILState_Ensure()" * "PyGILState_GetThisThreadState()" * "PyGILState_Release()" * "PyGILState_STATE" * "PyGetSetDef" * "PyGetSetDescr_Type" * "PyImport_AddModule()" * "PyImport_AddModuleObject()" * "PyImport_AddModuleRef()" * "PyImport_AppendInittab()" * "PyImport_ExecCodeModule()" * "PyImport_ExecCodeModuleEx()" * "PyImport_ExecCodeModuleObject()" * "PyImport_ExecCodeModuleWithPathnames()" * "PyImport_GetImporter()" * "PyImport_GetMagicNumber()" * "PyImport_GetMagicTag()" * "PyImport_GetModule()" * "PyImport_GetModuleDict()" * "PyImport_Import()" * "PyImport_ImportFrozenModule()" * "PyImport_ImportFrozenModuleObject()" * "PyImport_ImportModule()" * "PyImport_ImportModuleLevel()" * "PyImport_ImportModuleLevelObject()" * "PyImport_ImportModuleNoBlock()" * "PyImport_ReloadModule()" * "PyIndex_Check()" * "PyInterpreterState" * "PyInterpreterState_Clear()" * "PyInterpreterState_Delete()" * "PyInterpreterState_Get()" * "PyInterpreterState_GetDict()" * "PyInterpreterState_GetID()" * "PyInterpreterState_New()" * "PyIter_Check()" * "PyIter_Next()" * "PyIter_Send()" * "PyListIter_Type" * "PyListRevIter_Type" * "PyList_Append()" * "PyList_AsTuple()" * "PyList_GetItem()" * "PyList_GetItemRef()" * "PyList_GetSlice()" * "PyList_Insert()" * "PyList_New()" * "PyList_Reverse()" * "PyList_SetItem()" * "PyList_SetSlice()" * "PyList_Size()" * "PyList_Sort()" * "PyList_Type" * "PyLongObject" * "PyLongRangeIter_Type" * "PyLong_AsDouble()" * "PyLong_AsInt()" * "PyLong_AsLong()" * "PyLong_AsLongAndOverflow()" * "PyLong_AsLongLong()" * "PyLong_AsLongLongAndOverflow()" * "PyLong_AsSize_t()" * "PyLong_AsSsize_t()" * "PyLong_AsUnsignedLong()" * "PyLong_AsUnsignedLongLong()" * "PyLong_AsUnsignedLongLongMask()" * "PyLong_AsUnsignedLongMask()" * "PyLong_AsVoidPtr()" * "PyLong_FromDouble()" * "PyLong_FromLong()" * "PyLong_FromLongLong()" * "PyLong_FromSize_t()" * "PyLong_FromSsize_t()" * "PyLong_FromString()" * "PyLong_FromUnsignedLong()" * "PyLong_FromUnsignedLongLong()" * "PyLong_FromVoidPtr()" * "PyLong_GetInfo()" * "PyLong_Type" * "PyMap_Type" * "PyMapping_Check()" * "PyMapping_GetItemString()" * "PyMapping_GetOptionalItem()" * "PyMapping_GetOptionalItemString()" * "PyMapping_HasKey()" * "PyMapping_HasKeyString()" * "PyMapping_HasKeyStringWithError()" * "PyMapping_HasKeyWithError()" * "PyMapping_Items()" * "PyMapping_Keys()" * "PyMapping_Length()" * "PyMapping_SetItemString()" * "PyMapping_Size()" * "PyMapping_Values()" * "PyMem_Calloc()" * "PyMem_Free()" * "PyMem_Malloc()" * "PyMem_RawCalloc()" * "PyMem_RawFree()" * "PyMem_RawMalloc()" * "PyMem_RawRealloc()" * "PyMem_Realloc()" * "PyMemberDef" * "PyMemberDescr_Type" * "PyMember_GetOne()" * "PyMember_SetOne()" * "PyMemoryView_FromBuffer()" * "PyMemoryView_FromMemory()" * "PyMemoryView_FromObject()" * "PyMemoryView_GetContiguous()" * "PyMemoryView_Type" * "PyMethodDef" * "PyMethodDescr_Type" * "PyModuleDef" * "PyModuleDef_Base" * "PyModuleDef_Init()" * "PyModuleDef_Type" * "PyModule_Add()" * "PyModule_AddFunctions()" * "PyModule_AddIntConstant()" * "PyModule_AddObject()" * "PyModule_AddObjectRef()" * "PyModule_AddStringConstant()" * "PyModule_AddType()" * "PyModule_Create2()" * "PyModule_ExecDef()" * "PyModule_FromDefAndSpec2()" * "PyModule_GetDef()" * "PyModule_GetDict()" * "PyModule_GetFilename()" * "PyModule_GetFilenameObject()" * "PyModule_GetName()" * "PyModule_GetNameObject()" * "PyModule_GetState()" * "PyModule_New()" * "PyModule_NewObject()" * "PyModule_SetDocString()" * "PyModule_Type" * "PyNumber_Absolute()" * "PyNumber_Add()" * "PyNumber_And()" * "PyNumber_AsSsize_t()" * "PyNumber_Check()" * "PyNumber_Divmod()" * "PyNumber_Float()" * "PyNumber_FloorDivide()" * "PyNumber_InPlaceAdd()" * "PyNumber_InPlaceAnd()" * "PyNumber_InPlaceFloorDivide()" * "PyNumber_InPlaceLshift()" * "PyNumber_InPlaceMatrixMultiply()" * "PyNumber_InPlaceMultiply()" * "PyNumber_InPlaceOr()" * "PyNumber_InPlacePower()" * "PyNumber_InPlaceRemainder()" * "PyNumber_InPlaceRshift()" * "PyNumber_InPlaceSubtract()" * "PyNumber_InPlaceTrueDivide()" * "PyNumber_InPlaceXor()" * "PyNumber_Index()" * "PyNumber_Invert()" * "PyNumber_Long()" * "PyNumber_Lshift()" * "PyNumber_MatrixMultiply()" * "PyNumber_Multiply()" * "PyNumber_Negative()" * "PyNumber_Or()" * "PyNumber_Positive()" * "PyNumber_Power()" * "PyNumber_Remainder()" * "PyNumber_Rshift()" * "PyNumber_Subtract()" * "PyNumber_ToBase()" * "PyNumber_TrueDivide()" * "PyNumber_Xor()" * "PyOS_AfterFork()" * "PyOS_AfterFork_Child()" * "PyOS_AfterFork_Parent()" * "PyOS_BeforeFork()" * "PyOS_CheckStack()" * "PyOS_FSPath()" * "PyOS_InputHook" * "PyOS_InterruptOccurred()" * "PyOS_double_to_string()" * "PyOS_getsig()" * "PyOS_mystricmp()" * "PyOS_mystrnicmp()" * "PyOS_setsig()" * "PyOS_sighandler_t" * "PyOS_snprintf()" * "PyOS_string_to_double()" * "PyOS_strtol()" * "PyOS_strtoul()" * "PyOS_vsnprintf()" * "PyObject" * "PyObject.ob_refcnt" * "PyObject.ob_type" * "PyObject_ASCII()" * "PyObject_AsFileDescriptor()" * "PyObject_Bytes()" * "PyObject_Call()" * "PyObject_CallFunction()" * "PyObject_CallFunctionObjArgs()" * "PyObject_CallMethod()" * "PyObject_CallMethodObjArgs()" * "PyObject_CallNoArgs()" * "PyObject_CallObject()" * "PyObject_Calloc()" * "PyObject_CheckBuffer()" * "PyObject_ClearWeakRefs()" * "PyObject_CopyData()" * "PyObject_DelAttr()" * "PyObject_DelAttrString()" * "PyObject_DelItem()" * "PyObject_DelItemString()" * "PyObject_Dir()" * "PyObject_Format()" * "PyObject_Free()" * "PyObject_GC_Del()" * "PyObject_GC_IsFinalized()" * "PyObject_GC_IsTracked()" * "PyObject_GC_Track()" * "PyObject_GC_UnTrack()" * "PyObject_GenericGetAttr()" * "PyObject_GenericGetDict()" * "PyObject_GenericSetAttr()" * "PyObject_GenericSetDict()" * "PyObject_GetAIter()" * "PyObject_GetAttr()" * "PyObject_GetAttrString()" * "PyObject_GetBuffer()" * "PyObject_GetItem()" * "PyObject_GetIter()" * "PyObject_GetOptionalAttr()" * "PyObject_GetOptionalAttrString()" * "PyObject_GetTypeData()" * "PyObject_HasAttr()" * "PyObject_HasAttrString()" * "PyObject_HasAttrStringWithError()" * "PyObject_HasAttrWithError()" * "PyObject_Hash()" * "PyObject_HashNotImplemented()" * "PyObject_Init()" * "PyObject_InitVar()" * "PyObject_IsInstance()" * "PyObject_IsSubclass()" * "PyObject_IsTrue()" * "PyObject_Length()" * "PyObject_Malloc()" * "PyObject_Not()" * "PyObject_Realloc()" * "PyObject_Repr()" * "PyObject_RichCompare()" * "PyObject_RichCompareBool()" * "PyObject_SelfIter()" * "PyObject_SetAttr()" * "PyObject_SetAttrString()" * "PyObject_SetItem()" * "PyObject_Size()" * "PyObject_Str()" * "PyObject_Type()" * "PyObject_Vectorcall()" * "PyObject_VectorcallMethod()" * "PyProperty_Type" * "PyRangeIter_Type" * "PyRange_Type" * "PyReversed_Type" * "PySeqIter_New()" * "PySeqIter_Type" * "PySequence_Check()" * "PySequence_Concat()" * "PySequence_Contains()" * "PySequence_Count()" * "PySequence_DelItem()" * "PySequence_DelSlice()" * "PySequence_Fast()" * "PySequence_GetItem()" * "PySequence_GetSlice()" * "PySequence_In()" * "PySequence_InPlaceConcat()" * "PySequence_InPlaceRepeat()" * "PySequence_Index()" * "PySequence_Length()" * "PySequence_List()" * "PySequence_Repeat()" * "PySequence_SetItem()" * "PySequence_SetSlice()" * "PySequence_Size()" * "PySequence_Tuple()" * "PySetIter_Type" * "PySet_Add()" * "PySet_Clear()" * "PySet_Contains()" * "PySet_Discard()" * "PySet_New()" * "PySet_Pop()" * "PySet_Size()" * "PySet_Type" * "PySlice_AdjustIndices()" * "PySlice_GetIndices()" * "PySlice_GetIndicesEx()" * "PySlice_New()" * "PySlice_Type" * "PySlice_Unpack()" * "PyState_AddModule()" * "PyState_FindModule()" * "PyState_RemoveModule()" * "PyStructSequence_Desc" * "PyStructSequence_Field" * "PyStructSequence_GetItem()" * "PyStructSequence_New()" * "PyStructSequence_NewType()" * "PyStructSequence_SetItem()" * "PyStructSequence_UnnamedField" * "PySuper_Type" * "PySys_Audit()" * "PySys_AuditTuple()" * "PySys_FormatStderr()" * "PySys_FormatStdout()" * "PySys_GetObject()" * "PySys_GetXOptions()" * "PySys_ResetWarnOptions()" * "PySys_SetArgv()" * "PySys_SetArgvEx()" * "PySys_SetObject()" * "PySys_WriteStderr()" * "PySys_WriteStdout()" * "PyThreadState" * "PyThreadState_Clear()" * "PyThreadState_Delete()" * "PyThreadState_Get()" * "PyThreadState_GetDict()" * "PyThreadState_GetFrame()" * "PyThreadState_GetID()" * "PyThreadState_GetInterpreter()" * "PyThreadState_New()" * "PyThreadState_SetAsyncExc()" * "PyThreadState_Swap()" * "PyThread_GetInfo()" * "PyThread_ReInitTLS()" * "PyThread_acquire_lock()" * "PyThread_acquire_lock_timed()" * "PyThread_allocate_lock()" * "PyThread_create_key()" * "PyThread_delete_key()" * "PyThread_delete_key_value()" * "PyThread_exit_thread()" * "PyThread_free_lock()" * "PyThread_get_key_value()" * "PyThread_get_stacksize()" * "PyThread_get_thread_ident()" * "PyThread_get_thread_native_id()" * "PyThread_init_thread()" * "PyThread_release_lock()" * "PyThread_set_key_value()" * "PyThread_set_stacksize()" * "PyThread_start_new_thread()" * "PyThread_tss_alloc()" * "PyThread_tss_create()" * "PyThread_tss_delete()" * "PyThread_tss_free()" * "PyThread_tss_get()" * "PyThread_tss_is_created()" * "PyThread_tss_set()" * "PyTraceBack_Here()" * "PyTraceBack_Print()" * "PyTraceBack_Type" * "PyTupleIter_Type" * "PyTuple_GetItem()" * "PyTuple_GetSlice()" * "PyTuple_New()" * "PyTuple_Pack()" * "PyTuple_SetItem()" * "PyTuple_Size()" * "PyTuple_Type" * "PyTypeObject" * "PyType_ClearCache()" * "PyType_FromMetaclass()" * "PyType_FromModuleAndSpec()" * "PyType_FromSpec()" * "PyType_FromSpecWithBases()" * "PyType_GenericAlloc()" * "PyType_GenericNew()" * "PyType_GetFlags()" * "PyType_GetFullyQualifiedName()" * "PyType_GetModule()" * "PyType_GetModuleByDef()" * "PyType_GetModuleName()" * "PyType_GetModuleState()" * "PyType_GetName()" * "PyType_GetQualName()" * "PyType_GetSlot()" * "PyType_GetTypeDataSize()" * "PyType_IsSubtype()" * "PyType_Modified()" * "PyType_Ready()" * "PyType_Slot" * "PyType_Spec" * "PyType_Type" * "PyUnicodeDecodeError_Create()" * "PyUnicodeDecodeError_GetEncoding()" * "PyUnicodeDecodeError_GetEnd()" * "PyUnicodeDecodeError_GetObject()" * "PyUnicodeDecodeError_GetReason()" * "PyUnicodeDecodeError_GetStart()" * "PyUnicodeDecodeError_SetEnd()" * "PyUnicodeDecodeError_SetReason()" * "PyUnicodeDecodeError_SetStart()" * "PyUnicodeEncodeError_GetEncoding()" * "PyUnicodeEncodeError_GetEnd()" * "PyUnicodeEncodeError_GetObject()" * "PyUnicodeEncodeError_GetReason()" * "PyUnicodeEncodeError_GetStart()" * "PyUnicodeEncodeError_SetEnd()" * "PyUnicodeEncodeError_SetReason()" * "PyUnicodeEncodeError_SetStart()" * "PyUnicodeIter_Type" * "PyUnicodeTranslateError_GetEnd()" * "PyUnicodeTranslateError_GetObject()" * "PyUnicodeTranslateError_GetReason()" * "PyUnicodeTranslateError_GetStart()" * "PyUnicodeTranslateError_SetEnd()" * "PyUnicodeTranslateError_SetReason()" * "PyUnicodeTranslateError_SetStart()" * "PyUnicode_Append()" * "PyUnicode_AppendAndDel()" * "PyUnicode_AsASCIIString()" * "PyUnicode_AsCharmapString()" * "PyUnicode_AsDecodedObject()" * "PyUnicode_AsDecodedUnicode()" * "PyUnicode_AsEncodedObject()" * "PyUnicode_AsEncodedString()" * "PyUnicode_AsEncodedUnicode()" * "PyUnicode_AsLatin1String()" * "PyUnicode_AsMBCSString()" * "PyUnicode_AsRawUnicodeEscapeString()" * "PyUnicode_AsUCS4()" * "PyUnicode_AsUCS4Copy()" * "PyUnicode_AsUTF16String()" * "PyUnicode_AsUTF32String()" * "PyUnicode_AsUTF8AndSize()" * "PyUnicode_AsUTF8String()" * "PyUnicode_AsUnicodeEscapeString()" * "PyUnicode_AsWideChar()" * "PyUnicode_AsWideCharString()" * "PyUnicode_BuildEncodingMap()" * "PyUnicode_Compare()" * "PyUnicode_CompareWithASCIIString()" * "PyUnicode_Concat()" * "PyUnicode_Contains()" * "PyUnicode_Count()" * "PyUnicode_Decode()" * "PyUnicode_DecodeASCII()" * "PyUnicode_DecodeCharmap()" * "PyUnicode_DecodeCodePageStateful()" * "PyUnicode_DecodeFSDefault()" * "PyUnicode_DecodeFSDefaultAndSize()" * "PyUnicode_DecodeLatin1()" * "PyUnicode_DecodeLocale()" * "PyUnicode_DecodeLocaleAndSize()" * "PyUnicode_DecodeMBCS()" * "PyUnicode_DecodeMBCSStateful()" * "PyUnicode_DecodeRawUnicodeEscape()" * "PyUnicode_DecodeUTF16()" * "PyUnicode_DecodeUTF16Stateful()" * "PyUnicode_DecodeUTF32()" * "PyUnicode_DecodeUTF32Stateful()" * "PyUnicode_DecodeUTF7()" * "PyUnicode_DecodeUTF7Stateful()" * "PyUnicode_DecodeUTF8()" * "PyUnicode_DecodeUTF8Stateful()" * "PyUnicode_DecodeUnicodeEscape()" * "PyUnicode_EncodeCodePage()" * "PyUnicode_EncodeFSDefault()" * "PyUnicode_EncodeLocale()" * "PyUnicode_EqualToUTF8()" * "PyUnicode_EqualToUTF8AndSize()" * "PyUnicode_FSConverter()" * "PyUnicode_FSDecoder()" * "PyUnicode_Find()" * "PyUnicode_FindChar()" * "PyUnicode_Format()" * "PyUnicode_FromEncodedObject()" * "PyUnicode_FromFormat()" * "PyUnicode_FromFormatV()" * "PyUnicode_FromObject()" * "PyUnicode_FromOrdinal()" * "PyUnicode_FromString()" * "PyUnicode_FromStringAndSize()" * "PyUnicode_FromWideChar()" * "PyUnicode_GetDefaultEncoding()" * "PyUnicode_GetLength()" * "PyUnicode_InternFromString()" * "PyUnicode_InternInPlace()" * "PyUnicode_IsIdentifier()" * "PyUnicode_Join()" * "PyUnicode_Partition()" * "PyUnicode_RPartition()" * "PyUnicode_RSplit()" * "PyUnicode_ReadChar()" * "PyUnicode_Replace()" * "PyUnicode_Resize()" * "PyUnicode_RichCompare()" * "PyUnicode_Split()" * "PyUnicode_Splitlines()" * "PyUnicode_Substring()" * "PyUnicode_Tailmatch()" * "PyUnicode_Translate()" * "PyUnicode_Type" * "PyUnicode_WriteChar()" * "PyVarObject" * "PyVarObject.ob_base" * "PyVarObject.ob_size" * "PyVectorcall_Call()" * "PyVectorcall_NARGS()" * "PyWeakReference" * "PyWeakref_GetObject()" * "PyWeakref_GetRef()" * "PyWeakref_NewProxy()" * "PyWeakref_NewRef()" * "PyWrapperDescr_Type" * "PyWrapper_New()" * "PyZip_Type" * "Py_AddPendingCall()" * "Py_AtExit()" * "Py_BEGIN_ALLOW_THREADS" * "Py_BLOCK_THREADS" * "Py_BuildValue()" * "Py_BytesMain()" * "Py_CompileString()" * "Py_DecRef()" * "Py_DecodeLocale()" * "Py_END_ALLOW_THREADS" * "Py_EncodeLocale()" * "Py_EndInterpreter()" * "Py_EnterRecursiveCall()" * "Py_Exit()" * "Py_FatalError()" * "Py_FileSystemDefaultEncodeErrors" * "Py_FileSystemDefaultEncoding" * "Py_Finalize()" * "Py_FinalizeEx()" * "Py_GenericAlias()" * "Py_GenericAliasType" * "Py_GetBuildInfo()" * "Py_GetCompiler()" * "Py_GetConstant()" * "Py_GetConstantBorrowed()" * "Py_GetCopyright()" * "Py_GetExecPrefix()" * "Py_GetPath()" * "Py_GetPlatform()" * "Py_GetPrefix()" * "Py_GetProgramFullPath()" * "Py_GetProgramName()" * "Py_GetPythonHome()" * "Py_GetRecursionLimit()" * "Py_GetVersion()" * "Py_HasFileSystemDefaultEncoding" * "Py_IncRef()" * "Py_Initialize()" * "Py_InitializeEx()" * "Py_Is()" * "Py_IsFalse()" * "Py_IsFinalizing()" * "Py_IsInitialized()" * "Py_IsNone()" * "Py_IsTrue()" * "Py_LeaveRecursiveCall()" * "Py_Main()" * "Py_MakePendingCalls()" * "Py_NewInterpreter()" * "Py_NewRef()" * "Py_ReprEnter()" * "Py_ReprLeave()" * "Py_SetProgramName()" * "Py_SetPythonHome()" * "Py_SetRecursionLimit()" * "Py_UCS4" * "Py_UNBLOCK_THREADS" * "Py_UTF8Mode" * "Py_VaBuildValue()" * "Py_Version" * "Py_XNewRef()" * "Py_buffer" * "Py_intptr_t" * "Py_ssize_t" * "Py_uintptr_t" * "allocfunc" * "binaryfunc" * "descrgetfunc" * "descrsetfunc" * "destructor" * "getattrfunc" * "getattrofunc" * "getbufferproc" * "getiterfunc" * "getter" * "hashfunc" * "initproc" * "inquiry" * "iternextfunc" * "lenfunc" * "newfunc" * "objobjargproc" * "objobjproc" * "releasebufferproc" * "reprfunc" * "richcmpfunc" * "setattrfunc" * "setattrofunc" * "setter" * "ssizeargfunc" * "ssizeobjargproc" * "ssizessizeargfunc" * "ssizessizeobjargproc" * "symtable" * "ternaryfunc" * "traverseproc" * "unaryfunc" * "vectorcallfunc" * "visitproc" Common Object Structures ************************ There are a large number of structures which are used in the definition of object types for Python. This section describes these structures and how they are used. Base object types and macros ============================ All Python objects ultimately share a small number of fields at the beginning of the object’s representation in memory. These are represented by the "PyObject" and "PyVarObject" types, which are defined, in turn, by the expansions of some macros also used, whether directly or indirectly, in the definition of all other Python objects. Additional macros can be found under reference counting. type PyObject * Part of the Limited API. (Only some members are part of the stable ABI.)* All object types are extensions of this type. This is a type which contains the information Python needs to treat a pointer to an object as an object. In a normal “release” build, it contains only the object’s reference count and a pointer to the corresponding type object. Nothing is actually declared to be a "PyObject", but every pointer to a Python object can be cast to a PyObject*. Access to the members must be done by using the macros "Py_REFCNT" and "Py_TYPE". type PyVarObject * Part of the Limited API. (Only some members are part of the stable ABI.)* This is an extension of "PyObject" that adds the "ob_size" field. This is only used for objects that have some notion of *length*. This type does not often appear in the Python/C API. Access to the members must be done by using the macros "Py_REFCNT", "Py_TYPE", and "Py_SIZE". PyObject_HEAD This is a macro used when declaring new types which represent objects without a varying length. The PyObject_HEAD macro expands to: PyObject ob_base; See documentation of "PyObject" above. PyObject_VAR_HEAD This is a macro used when declaring new types which represent objects with a length that varies from instance to instance. The PyObject_VAR_HEAD macro expands to: PyVarObject ob_base; See documentation of "PyVarObject" above. PyTypeObject PyBaseObject_Type * Part of the Stable ABI.* The base class of all other objects, the same as "object" in Python. int Py_Is(PyObject *x, PyObject *y) * Part of the Stable ABI since version 3.10.* Test if the *x* object is the *y* object, the same as "x is y" in Python. Added in version 3.10. int Py_IsNone(PyObject *x) * Part of the Stable ABI since version 3.10.* Test if an object is the "None" singleton, the same as "x is None" in Python. Added in version 3.10. int Py_IsTrue(PyObject *x) * Part of the Stable ABI since version 3.10.* Test if an object is the "True" singleton, the same as "x is True" in Python. Added in version 3.10. int Py_IsFalse(PyObject *x) * Part of the Stable ABI since version 3.10.* Test if an object is the "False" singleton, the same as "x is False" in Python. Added in version 3.10. PyTypeObject *Py_TYPE(PyObject *o) *Return value: Borrowed reference.* Get the type of the Python object *o*. Return a *borrowed reference*. Use the "Py_SET_TYPE()" function to set an object type. Changed in version 3.11: "Py_TYPE()" is changed to an inline static function. The parameter type is no longer const PyObject*. int Py_IS_TYPE(PyObject *o, PyTypeObject *type) Return non-zero if the object *o* type is *type*. Return zero otherwise. Equivalent to: "Py_TYPE(o) == type". Added in version 3.9. void Py_SET_TYPE(PyObject *o, PyTypeObject *type) Set the object *o* type to *type*. Added in version 3.9. Py_ssize_t Py_SIZE(PyVarObject *o) Get the size of the Python object *o*. Use the "Py_SET_SIZE()" function to set an object size. Changed in version 3.11: "Py_SIZE()" is changed to an inline static function. The parameter type is no longer const PyVarObject*. void Py_SET_SIZE(PyVarObject *o, Py_ssize_t size) Set the object *o* size to *size*. Added in version 3.9. PyObject_HEAD_INIT(type) This is a macro which expands to initialization values for a new "PyObject" type. This macro expands to: _PyObject_EXTRA_INIT 1, type, PyVarObject_HEAD_INIT(type, size) This is a macro which expands to initialization values for a new "PyVarObject" type, including the "ob_size" field. This macro expands to: _PyObject_EXTRA_INIT 1, type, size, Implementing functions and methods ================================== type PyCFunction * Part of the Stable ABI.* Type of the functions used to implement most Python callables in C. Functions of this type take two PyObject* parameters and return one such value. If the return value is "NULL", an exception shall have been set. If not "NULL", the return value is interpreted as the return value of the function as exposed in Python. The function must return a new reference. The function signature is: PyObject *PyCFunction(PyObject *self, PyObject *args); type PyCFunctionWithKeywords * Part of the Stable ABI.* Type of the functions used to implement Python callables in C with signature METH_VARARGS | METH_KEYWORDS. The function signature is: PyObject *PyCFunctionWithKeywords(PyObject *self, PyObject *args, PyObject *kwargs); type PyCFunctionFast * Part of the Stable ABI since version 3.13.* Type of the functions used to implement Python callables in C with signature "METH_FASTCALL". The function signature is: PyObject *PyCFunctionFast(PyObject *self, PyObject *const *args, Py_ssize_t nargs); type PyCFunctionFastWithKeywords * Part of the Stable ABI since version 3.13.* Type of the functions used to implement Python callables in C with signature METH_FASTCALL | METH_KEYWORDS. The function signature is: PyObject *PyCFunctionFastWithKeywords(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames); type PyCMethod Type of the functions used to implement Python callables in C with signature METH_METHOD | METH_FASTCALL | METH_KEYWORDS. The function signature is: PyObject *PyCMethod(PyObject *self, PyTypeObject *defining_class, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) Added in version 3.9. type PyMethodDef * Part of the Stable ABI (including all members).* Structure used to describe a method of an extension type. This structure has four fields: const char *ml_name Name of the method. PyCFunction ml_meth Pointer to the C implementation. int ml_flags Flags bits indicating how the call should be constructed. const char *ml_doc Points to the contents of the docstring. The "ml_meth" is a C function pointer. The functions may be of different types, but they always return PyObject*. If the function is not of the "PyCFunction", the compiler will require a cast in the method table. Even though "PyCFunction" defines the first parameter as PyObject*, it is common that the method implementation uses the specific C type of the *self* object. The "ml_flags" field is a bitfield which can include the following flags. The individual flags indicate either a calling convention or a binding convention. There are these calling conventions: METH_VARARGS This is the typical calling convention, where the methods have the type "PyCFunction". The function expects two PyObject* values. The first one is the *self* object for methods; for module functions, it is the module object. The second parameter (often called *args*) is a tuple object representing all arguments. This parameter is typically processed using "PyArg_ParseTuple()" or "PyArg_UnpackTuple()". METH_KEYWORDS Can only be used in certain combinations with other flags: METH_VARARGS | METH_KEYWORDS, METH_FASTCALL | METH_KEYWORDS and METH_METHOD | METH_FASTCALL | METH_KEYWORDS. METH_VARARGS | METH_KEYWORDS Methods with these flags must be of type "PyCFunctionWithKeywords". The function expects three parameters: *self*, *args*, *kwargs* where *kwargs* is a dictionary of all the keyword arguments or possibly "NULL" if there are no keyword arguments. The parameters are typically processed using "PyArg_ParseTupleAndKeywords()". METH_FASTCALL Fast calling convention supporting only positional arguments. The methods have the type "PyCFunctionFast". The first parameter is *self*, the second parameter is a C array of PyObject* values indicating the arguments and the third parameter is the number of arguments (the length of the array). Added in version 3.7. Changed in version 3.10: "METH_FASTCALL" is now part of the stable ABI. METH_FASTCALL | METH_KEYWORDS Extension of "METH_FASTCALL" supporting also keyword arguments, with methods of type "PyCFunctionFastWithKeywords". Keyword arguments are passed the same way as in the vectorcall protocol: there is an additional fourth PyObject* parameter which is a tuple representing the names of the keyword arguments (which are guaranteed to be strings) or possibly "NULL" if there are no keywords. The values of the keyword arguments are stored in the *args* array, after the positional arguments. Added in version 3.7. METH_METHOD Can only be used in the combination with other flags: METH_METHOD | METH_FASTCALL | METH_KEYWORDS. METH_METHOD | METH_FASTCALL | METH_KEYWORDS Extension of METH_FASTCALL | METH_KEYWORDS supporting the *defining class*, that is, the class that contains the method in question. The defining class might be a superclass of "Py_TYPE(self)". The method needs to be of type "PyCMethod", the same as for "METH_FASTCALL | METH_KEYWORDS" with "defining_class" argument added after "self". Added in version 3.9. METH_NOARGS Methods without parameters don’t need to check whether arguments are given if they are listed with the "METH_NOARGS" flag. They need to be of type "PyCFunction". The first parameter is typically named *self* and will hold a reference to the module or object instance. In all cases the second parameter will be "NULL". The function must have 2 parameters. Since the second parameter is unused, "Py_UNUSED" can be used to prevent a compiler warning. METH_O Methods with a single object argument can be listed with the "METH_O" flag, instead of invoking "PyArg_ParseTuple()" with a ""O"" argument. They have the type "PyCFunction", with the *self* parameter, and a PyObject* parameter representing the single argument. These two constants are not used to indicate the calling convention but the binding when use with methods of classes. These may not be used for functions defined for modules. At most one of these flags may be set for any given method. METH_CLASS The method will be passed the type object as the first parameter rather than an instance of the type. This is used to create *class methods*, similar to what is created when using the "classmethod()" built-in function. METH_STATIC The method will be passed "NULL" as the first parameter rather than an instance of the type. This is used to create *static methods*, similar to what is created when using the "staticmethod()" built-in function. One other constant controls whether a method is loaded in place of another definition with the same method name. METH_COEXIST The method will be loaded in place of existing definitions. Without *METH_COEXIST*, the default is to skip repeated definitions. Since slot wrappers are loaded before the method table, the existence of a *sq_contains* slot, for example, would generate a wrapped method named "__contains__()" and preclude the loading of a corresponding PyCFunction with the same name. With the flag defined, the PyCFunction will be loaded in place of the wrapper object and will co-exist with the slot. This is helpful because calls to PyCFunctions are optimized more than wrapper object calls. PyObject *PyCMethod_New(PyMethodDef *ml, PyObject *self, PyObject *module, PyTypeObject *cls) *Return value: New reference.** Part of the Stable ABI since version 3.9.* Turn *ml* into a Python *callable* object. The caller must ensure that *ml* outlives the *callable*. Typically, *ml* is defined as a static variable. The *self* parameter will be passed as the *self* argument to the C function in "ml->ml_meth" when invoked. *self* can be "NULL". The *callable* object’s "__module__" attribute can be set from the given *module* argument. *module* should be a Python string, which will be used as name of the module the function is defined in. If unavailable, it can be set to "None" or "NULL". See also: "function.__module__" The *cls* parameter will be passed as the *defining_class* argument to the C function. Must be set if "METH_METHOD" is set on "ml->ml_flags". Added in version 3.9. PyObject *PyCFunction_NewEx(PyMethodDef *ml, PyObject *self, PyObject *module) *Return value: New reference.** Part of the Stable ABI.* Equivalent to "PyCMethod_New(ml, self, module, NULL)". PyObject *PyCFunction_New(PyMethodDef *ml, PyObject *self) *Return value: New reference.** Part of the Stable ABI since version 3.4.* Equivalent to "PyCMethod_New(ml, self, NULL, NULL)". Accessing attributes of extension types ======================================= type PyMemberDef * Part of the Stable ABI (including all members).* Structure which describes an attribute of a type which corresponds to a C struct member. When defining a class, put a NULL-terminated array of these structures in the "tp_members" slot. Its fields are, in order: const char *name Name of the member. A NULL value marks the end of a "PyMemberDef[]" array. The string should be static, no copy is made of it. int type The type of the member in the C struct. See Member types for the possible values. Py_ssize_t offset The offset in bytes that the member is located on the type’s object struct. int flags Zero or more of the Member flags, combined using bitwise OR. const char *doc The docstring, or NULL. The string should be static, no copy is made of it. Typically, it is defined using "PyDoc_STR". By default (when "flags" is "0"), members allow both read and write access. Use the "Py_READONLY" flag for read-only access. Certain types, like "Py_T_STRING", imply "Py_READONLY". Only "Py_T_OBJECT_EX" (and legacy "T_OBJECT") members can be deleted. For heap-allocated types (created using "PyType_FromSpec()" or similar), "PyMemberDef" may contain a definition for the special member ""__vectorcalloffset__"", corresponding to "tp_vectorcall_offset" in type objects. These must be defined with "Py_T_PYSSIZET" and "Py_READONLY", for example: static PyMemberDef spam_type_members[] = { {"__vectorcalloffset__", Py_T_PYSSIZET, offsetof(Spam_object, vectorcall), Py_READONLY}, {NULL} /* Sentinel */ }; (You may need to "#include " for "offsetof()".) The legacy offsets "tp_dictoffset" and "tp_weaklistoffset" can be defined similarly using ""__dictoffset__"" and ""__weaklistoffset__"" members, but extensions are strongly encouraged to use "Py_TPFLAGS_MANAGED_DICT" and "Py_TPFLAGS_MANAGED_WEAKREF" instead. Changed in version 3.12: "PyMemberDef" is always available. Previously, it required including ""structmember.h"". PyObject *PyMember_GetOne(const char *obj_addr, struct PyMemberDef *m) * Part of the Stable ABI.* Get an attribute belonging to the object at address *obj_addr*. The attribute is described by "PyMemberDef" *m*. Returns "NULL" on error. Changed in version 3.12: "PyMember_GetOne" is always available. Previously, it required including ""structmember.h"". int PyMember_SetOne(char *obj_addr, struct PyMemberDef *m, PyObject *o) * Part of the Stable ABI.* Set an attribute belonging to the object at address *obj_addr* to object *o*. The attribute to set is described by "PyMemberDef" *m*. Returns "0" if successful and a negative value on failure. Changed in version 3.12: "PyMember_SetOne" is always available. Previously, it required including ""structmember.h"". Member flags ------------ The following flags can be used with "PyMemberDef.flags": Py_READONLY Not writable. Py_AUDIT_READ Emit an "object.__getattr__" audit event before reading. Py_RELATIVE_OFFSET Indicates that the "offset" of this "PyMemberDef" entry indicates an offset from the subclass-specific data, rather than from "PyObject". Can only be used as part of "Py_tp_members" "slot" when creating a class using negative "basicsize". It is mandatory in that case. This flag is only used in "PyType_Slot". When setting "tp_members" during class creation, Python clears it and sets "PyMemberDef.offset" to the offset from the "PyObject" struct. Changed in version 3.10: The "RESTRICTED", "READ_RESTRICTED" and "WRITE_RESTRICTED" macros available with "#include "structmember.h"" are deprecated. "READ_RESTRICTED" and "RESTRICTED" are equivalent to "Py_AUDIT_READ"; "WRITE_RESTRICTED" does nothing. Changed in version 3.12: The "READONLY" macro was renamed to "Py_READONLY". The "PY_AUDIT_READ" macro was renamed with the "Py_" prefix. The new names are now always available. Previously, these required "#include "structmember.h"". The header is still available and it provides the old names. Member types ------------ "PyMemberDef.type" can be one of the following macros corresponding to various C types. When the member is accessed in Python, it will be converted to the equivalent Python type. When it is set from Python, it will be converted back to the C type. If that is not possible, an exception such as "TypeError" or "ValueError" is raised. Unless marked (D), attributes defined this way cannot be deleted using e.g. "del" or "delattr()". +----------------------------------+-------------------------------+------------------------+ | Macro name | C type | Python type | |==================================|===============================|========================| | Py_T_BYTE | char | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_SHORT | short | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_INT | int | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_LONG | long | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_LONGLONG | long long | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_UBYTE | unsigned char | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_UINT | unsigned int | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_USHORT | unsigned short | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_ULONG | unsigned long | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_ULONGLONG | unsigned long long | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_PYSSIZET | Py_ssize_t | "int" | +----------------------------------+-------------------------------+------------------------+ | Py_T_FLOAT | float | "float" | +----------------------------------+-------------------------------+------------------------+ | Py_T_DOUBLE | double | "float" | +----------------------------------+-------------------------------+------------------------+ | Py_T_BOOL | char (written as 0 or 1) | "bool" | +----------------------------------+-------------------------------+------------------------+ | Py_T_STRING | const char* (*) | "str" (RO) | +----------------------------------+-------------------------------+------------------------+ | Py_T_STRING_INPLACE | const char[] (*) | "str" (RO) | +----------------------------------+-------------------------------+------------------------+ | Py_T_CHAR | char (0-127) | "str" (**) | +----------------------------------+-------------------------------+------------------------+ | Py_T_OBJECT_EX | PyObject* | "object" (D) | +----------------------------------+-------------------------------+------------------------+ (*): Zero-terminated, UTF8-encoded C string. With "Py_T_STRING" the C representation is a pointer; with "Py_T_STRING_INPLACE" the string is stored directly in the structure. (**): String of length 1. Only ASCII is accepted. (RO): Implies "Py_READONLY". (D): Can be deleted, in which case the pointer is set to "NULL". Reading a "NULL" pointer raises "AttributeError". Added in version 3.12: In previous versions, the macros were only available with "#include "structmember.h"" and were named without the "Py_" prefix (e.g. as "T_INT"). The header is still available and contains the old names, along with the following deprecated types: T_OBJECT Like "Py_T_OBJECT_EX", but "NULL" is converted to "None". This results in surprising behavior in Python: deleting the attribute effectively sets it to "None". T_NONE Always "None". Must be used with "Py_READONLY". Defining Getters and Setters ---------------------------- type PyGetSetDef * Part of the Stable ABI (including all members).* Structure to define property-like access for a type. See also description of the "PyTypeObject.tp_getset" slot. const char *name attribute name getter get C function to get the attribute. setter set Optional C function to set or delete the attribute. If "NULL", the attribute is read-only. const char *doc optional docstring void *closure Optional user data pointer, providing additional data for getter and setter. typedef PyObject *(*getter)(PyObject*, void*) * Part of the Stable ABI.* The "get" function takes one PyObject* parameter (the instance) and a user data pointer (the associated "closure"): It should return a new reference on success or "NULL" with a set exception on failure. typedef int (*setter)(PyObject*, PyObject*, void*) * Part of the Stable ABI.* "set" functions take two PyObject* parameters (the instance and the value to be set) and a user data pointer (the associated "closure"): In case the attribute should be deleted the second parameter is "NULL". Should return "0" on success or "-1" with a set exception on failure. Operating System Utilities ************************** PyObject *PyOS_FSPath(PyObject *path) *Return value: New reference.** Part of the Stable ABI since version 3.6.* Return the file system representation for *path*. If the object is a "str" or "bytes" object, then a new *strong reference* is returned. If the object implements the "os.PathLike" interface, then "__fspath__()" is returned as long as it is a "str" or "bytes" object. Otherwise "TypeError" is raised and "NULL" is returned. Added in version 3.6. int Py_FdIsInteractive(FILE *fp, const char *filename) Return true (nonzero) if the standard I/O file *fp* with name *filename* is deemed interactive. This is the case for files for which "isatty(fileno(fp))" is true. If the "PyConfig.interactive" is non-zero, this function also returns true if the *filename* pointer is "NULL" or if the name is equal to one of the strings "''" or "'???'". This function must not be called before Python is initialized. void PyOS_BeforeFork() * Part of the Stable ABI on platforms with fork() since version 3.7.* Function to prepare some internal state before a process fork. This should be called before calling "fork()" or any similar function that clones the current process. Only available on systems where "fork()" is defined. Warning: The C "fork()" call should only be made from the “main” thread (of the “main” interpreter). The same is true for "PyOS_BeforeFork()". Added in version 3.7. void PyOS_AfterFork_Parent() * Part of the Stable ABI on platforms with fork() since version 3.7.* Function to update some internal state after a process fork. This should be called from the parent process after calling "fork()" or any similar function that clones the current process, regardless of whether process cloning was successful. Only available on systems where "fork()" is defined. Warning: The C "fork()" call should only be made from the “main” thread (of the “main” interpreter). The same is true for "PyOS_AfterFork_Parent()". Added in version 3.7. void PyOS_AfterFork_Child() * Part of the Stable ABI on platforms with fork() since version 3.7.* Function to update internal interpreter state after a process fork. This must be called from the child process after calling "fork()", or any similar function that clones the current process, if there is any chance the process will call back into the Python interpreter. Only available on systems where "fork()" is defined. Warning: The C "fork()" call should only be made from the “main” thread (of the “main” interpreter). The same is true for "PyOS_AfterFork_Child()". Added in version 3.7. See also: "os.register_at_fork()" allows registering custom Python functions to be called by "PyOS_BeforeFork()", "PyOS_AfterFork_Parent()" and "PyOS_AfterFork_Child()". void PyOS_AfterFork() * Part of the Stable ABI on platforms with fork().* Function to update some internal state after a process fork; this should be called in the new process if the Python interpreter will continue to be used. If a new executable is loaded into the new process, this function does not need to be called. Deprecated since version 3.7: This function is superseded by "PyOS_AfterFork_Child()". int PyOS_CheckStack() * Part of the Stable ABI on platforms with USE_STACKCHECK since version 3.7.* Return true when the interpreter runs out of stack space. This is a reliable check, but is only available when "USE_STACKCHECK" is defined (currently on certain versions of Windows using the Microsoft Visual C++ compiler). "USE_STACKCHECK" will be defined automatically; you should never change the definition in your own code. typedef void (*PyOS_sighandler_t)(int) * Part of the Stable ABI.* PyOS_sighandler_t PyOS_getsig(int i) * Part of the Stable ABI.* Return the current signal handler for signal *i*. This is a thin wrapper around either "sigaction()" or "signal()". Do not call those functions directly! PyOS_sighandler_t PyOS_setsig(int i, PyOS_sighandler_t h) * Part of the Stable ABI.* Set the signal handler for signal *i* to be *h*; return the old signal handler. This is a thin wrapper around either "sigaction()" or "signal()". Do not call those functions directly! wchar_t *Py_DecodeLocale(const char *arg, size_t *size) * Part of the Stable ABI since version 3.7.* Warning: This function should not be called directly: use the "PyConfig" API with the "PyConfig_SetBytesString()" function which ensures that Python is preinitialized.This function must not be called before Python is preinitialized and so that the LC_CTYPE locale is properly configured: see the "Py_PreInitialize()" function. Decode a byte string from the *filesystem encoding and error handler*. If the error handler is surrogateescape error handler, undecodable bytes are decoded as characters in range U+DC80..U+DCFF; and if a byte sequence can be decoded as a surrogate character, the bytes are escaped using the surrogateescape error handler instead of decoding them. Return a pointer to a newly allocated wide character string, use "PyMem_RawFree()" to free the memory. If size is not "NULL", write the number of wide characters excluding the null character into "*size" Return "NULL" on decoding error or memory allocation error. If *size* is not "NULL", "*size" is set to "(size_t)-1" on memory error or set to "(size_t)-2" on decoding error. The *filesystem encoding and error handler* are selected by "PyConfig_Read()": see "filesystem_encoding" and "filesystem_errors" members of "PyConfig". Decoding errors should never happen, unless there is a bug in the C library. Use the "Py_EncodeLocale()" function to encode the character string back to a byte string. See also: The "PyUnicode_DecodeFSDefaultAndSize()" and "PyUnicode_DecodeLocaleAndSize()" functions. Added in version 3.5. Changed in version 3.7: The function now uses the UTF-8 encoding in the Python UTF-8 Mode. Changed in version 3.8: The function now uses the UTF-8 encoding on Windows if "PyPreConfig.legacy_windows_fs_encoding" is zero; char *Py_EncodeLocale(const wchar_t *text, size_t *error_pos) * Part of the Stable ABI since version 3.7.* Encode a wide character string to the *filesystem encoding and error handler*. If the error handler is surrogateescape error handler, surrogate characters in the range U+DC80..U+DCFF are converted to bytes 0x80..0xFF. Return a pointer to a newly allocated byte string, use "PyMem_Free()" to free the memory. Return "NULL" on encoding error or memory allocation error. If error_pos is not "NULL", "*error_pos" is set to "(size_t)-1" on success, or set to the index of the invalid character on encoding error. The *filesystem encoding and error handler* are selected by "PyConfig_Read()": see "filesystem_encoding" and "filesystem_errors" members of "PyConfig". Use the "Py_DecodeLocale()" function to decode the bytes string back to a wide character string. Warning: This function must not be called before Python is preinitialized and so that the LC_CTYPE locale is properly configured: see the "Py_PreInitialize()" function. See also: The "PyUnicode_EncodeFSDefault()" and "PyUnicode_EncodeLocale()" functions. Added in version 3.5. Changed in version 3.7: The function now uses the UTF-8 encoding in the Python UTF-8 Mode. Changed in version 3.8: The function now uses the UTF-8 encoding on Windows if "PyPreConfig.legacy_windows_fs_encoding" is zero. System Functions **************** These are utility functions that make functionality from the "sys" module accessible to C code. They all work with the current interpreter thread’s "sys" module’s dict, which is contained in the internal thread state structure. PyObject *PySys_GetObject(const char *name) *Return value: Borrowed reference.** Part of the Stable ABI.* Return the object *name* from the "sys" module or "NULL" if it does not exist, without setting an exception. int PySys_SetObject(const char *name, PyObject *v) * Part of the Stable ABI.* Set *name* in the "sys" module to *v* unless *v* is "NULL", in which case *name* is deleted from the sys module. Returns "0" on success, "-1" on error. void PySys_ResetWarnOptions() * Part of the Stable ABI.* Reset "sys.warnoptions" to an empty list. This function may be called prior to "Py_Initialize()". Deprecated since version 3.13, will be removed in version 3.15: Clear "sys.warnoptions" and "warnings.filters" instead. void PySys_WriteStdout(const char *format, ...) * Part of the Stable ABI.* Write the output string described by *format* to "sys.stdout". No exceptions are raised, even if truncation occurs (see below). *format* should limit the total size of the formatted output string to 1000 bytes or less – after 1000 bytes, the output string is truncated. In particular, this means that no unrestricted “%s” formats should occur; these should be limited using “%.s” where is a decimal number calculated so that plus the maximum size of other formatted text does not exceed 1000 bytes. Also watch out for “%f”, which can print hundreds of digits for very large numbers. If a problem occurs, or "sys.stdout" is unset, the formatted message is written to the real (C level) *stdout*. void PySys_WriteStderr(const char *format, ...) * Part of the Stable ABI.* As "PySys_WriteStdout()", but write to "sys.stderr" or *stderr* instead. void PySys_FormatStdout(const char *format, ...) * Part of the Stable ABI.* Function similar to PySys_WriteStdout() but format the message using "PyUnicode_FromFormatV()" and don’t truncate the message to an arbitrary length. Added in version 3.2. void PySys_FormatStderr(const char *format, ...) * Part of the Stable ABI.* As "PySys_FormatStdout()", but write to "sys.stderr" or *stderr* instead. Added in version 3.2. PyObject *PySys_GetXOptions() *Return value: Borrowed reference.** Part of the Stable ABI since version 3.7.* Return the current dictionary of "-X" options, similarly to "sys._xoptions". On error, "NULL" is returned and an exception is set. Added in version 3.2. int PySys_Audit(const char *event, const char *format, ...) * Part of the Stable ABI since version 3.13.* Raise an auditing event with any active hooks. Return zero for success and non-zero with an exception set on failure. The *event* string argument must not be *NULL*. If any hooks have been added, *format* and other arguments will be used to construct a tuple to pass. Apart from "N", the same format characters as used in "Py_BuildValue()" are available. If the built value is not a tuple, it will be added into a single-element tuple. The "N" format option must not be used. It consumes a reference, but since there is no way to know whether arguments to this function will be consumed, using it may cause reference leaks. Note that "#" format characters should always be treated as "Py_ssize_t", regardless of whether "PY_SSIZE_T_CLEAN" was defined. "sys.audit()" performs the same function from Python code. See also "PySys_AuditTuple()". Added in version 3.8. Changed in version 3.8.2: Require "Py_ssize_t" for "#" format characters. Previously, an unavoidable deprecation warning was raised. int PySys_AuditTuple(const char *event, PyObject *args) * Part of the Stable ABI since version 3.13.* Similar to "PySys_Audit()", but pass arguments as a Python object. *args* must be a "tuple". To pass no arguments, *args* can be *NULL*. Added in version 3.13. int PySys_AddAuditHook(Py_AuditHookFunction hook, void *userData) Append the callable *hook* to the list of active auditing hooks. Return zero on success and non-zero on failure. If the runtime has been initialized, also set an error on failure. Hooks added through this API are called for all interpreters created by the runtime. The *userData* pointer is passed into the hook function. Since hook functions may be called from different runtimes, this pointer should not refer directly to Python state. This function is safe to call before "Py_Initialize()". When called after runtime initialization, existing audit hooks are notified and may silently abort the operation by raising an error subclassed from "Exception" (other errors will not be silenced). The hook function is always called with the GIL held by the Python interpreter that raised the event. See **PEP 578** for a detailed description of auditing. Functions in the runtime and standard library that raise events are listed in the audit events table. Details are in each function’s documentation. If the interpreter is initialized, this function raises an auditing event "sys.addaudithook" with no arguments. If any existing hooks raise an exception derived from "Exception", the new hook will not be added and the exception is cleared. As a result, callers cannot assume that their hook has been added unless they control all existing hooks. typedef int (*Py_AuditHookFunction)(const char *event, PyObject *args, void *userData) The type of the hook function. *event* is the C string event argument passed to "PySys_Audit()" or "PySys_AuditTuple()". *args* is guaranteed to be a "PyTupleObject". *userData* is the argument passed to PySys_AddAuditHook(). Added in version 3.8. Process Control *************** void Py_FatalError(const char *message) * Part of the Stable ABI.* Print a fatal error message and kill the process. No cleanup is performed. This function should only be invoked when a condition is detected that would make it dangerous to continue using the Python interpreter; e.g., when the object administration appears to be corrupted. On Unix, the standard C library function "abort()" is called which will attempt to produce a "core" file. The "Py_FatalError()" function is replaced with a macro which logs automatically the name of the current function, unless the "Py_LIMITED_API" macro is defined. Changed in version 3.9: Log the function name automatically. void Py_Exit(int status) * Part of the Stable ABI.* Exit the current process. This calls "Py_FinalizeEx()" and then calls the standard C library function "exit(status)". If "Py_FinalizeEx()" indicates an error, the exit status is set to 120. Changed in version 3.6: Errors from finalization no longer ignored. int Py_AtExit(void (*func)()) * Part of the Stable ABI.* Register a cleanup function to be called by "Py_FinalizeEx()". The cleanup function will be called with no arguments and should return no value. At most 32 cleanup functions can be registered. When the registration is successful, "Py_AtExit()" returns "0"; on failure, it returns "-1". The cleanup function registered last is called first. Each cleanup function will be called at most once. Since Python’s internal finalization will have completed before the cleanup function, no Python APIs should be called by *func*. See also: "PyUnstable_AtExit()" for passing a "void *data" argument. PyTime C API ************ Added in version 3.13. The clock C API provides access to system clocks. It is similar to the Python "time" module. For C API related to the "datetime" module, see DateTime Objects. Types ===== type PyTime_t A timestamp or duration in nanoseconds, represented as a signed 64-bit integer. The reference point for timestamps depends on the clock used. For example, "PyTime_Time()" returns timestamps relative to the UNIX epoch. The supported range is around [-292.3 years; +292.3 years]. Using the Unix epoch (January 1st, 1970) as reference, the supported date range is around [1677-09-21; 2262-04-11]. The exact limits are exposed as constants: PyTime_t PyTime_MIN Minimum value of "PyTime_t". PyTime_t PyTime_MAX Maximum value of "PyTime_t". Clock Functions =============== The following functions take a pointer to a PyTime_t that they set to the value of a particular clock. Details of each clock are given in the documentation of the corresponding Python function. The functions return "0" on success, or "-1" (with an exception set) on failure. On integer overflow, they set the "PyExc_OverflowError" exception and set "*result" to the value clamped to the "[PyTime_MIN; PyTime_MAX]" range. (On current systems, integer overflows are likely caused by misconfigured system time.) As any other C API (unless otherwise specified), the functions must be called with the *GIL* held. int PyTime_Monotonic(PyTime_t *result) Read the monotonic clock. See "time.monotonic()" for important details on this clock. int PyTime_PerfCounter(PyTime_t *result) Read the performance counter. See "time.perf_counter()" for important details on this clock. int PyTime_Time(PyTime_t *result) Read the “wall clock” time. See "time.time()" for details important on this clock. Raw Clock Functions =================== Similar to clock functions, but don’t set an exception on error and don’t require the caller to hold the GIL. On success, the functions return "0". On failure, they set "*result" to "0" and return "-1", *without* setting an exception. To get the cause of the error, acquire the GIL and call the regular (non-"Raw") function. Note that the regular function may succeed after the "Raw" one failed. int PyTime_MonotonicRaw(PyTime_t *result) Similar to "PyTime_Monotonic()", but don’t set an exception on error and don’t require holding the GIL. int PyTime_PerfCounterRaw(PyTime_t *result) Similar to "PyTime_PerfCounter()", but don’t set an exception on error and don’t require holding the GIL. int PyTime_TimeRaw(PyTime_t *result) Similar to "PyTime_Time()", but don’t set an exception on error and don’t require holding the GIL. Conversion functions ==================== double PyTime_AsSecondsDouble(PyTime_t t) Convert a timestamp to a number of seconds as a C double. The function cannot fail, but note that double has limited accuracy for large values. Tuple Objects ************* type PyTupleObject This subtype of "PyObject" represents a Python tuple object. PyTypeObject PyTuple_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python tuple type; it is the same object as "tuple" in the Python layer. int PyTuple_Check(PyObject *p) Return true if *p* is a tuple object or an instance of a subtype of the tuple type. This function always succeeds. int PyTuple_CheckExact(PyObject *p) Return true if *p* is a tuple object, but not an instance of a subtype of the tuple type. This function always succeeds. PyObject *PyTuple_New(Py_ssize_t len) *Return value: New reference.** Part of the Stable ABI.* Return a new tuple object of size *len*, or "NULL" with an exception set on failure. PyObject *PyTuple_Pack(Py_ssize_t n, ...) *Return value: New reference.** Part of the Stable ABI.* Return a new tuple object of size *n*, or "NULL" with an exception set on failure. The tuple values are initialized to the subsequent *n* C arguments pointing to Python objects. "PyTuple_Pack(2, a, b)" is equivalent to "Py_BuildValue("(OO)", a, b)". Py_ssize_t PyTuple_Size(PyObject *p) * Part of the Stable ABI.* Take a pointer to a tuple object, and return the size of that tuple. On error, return "-1" and with an exception set. Py_ssize_t PyTuple_GET_SIZE(PyObject *p) Like "PyTuple_Size()", but without error checking. PyObject *PyTuple_GetItem(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.** Part of the Stable ABI.* Return the object at position *pos* in the tuple pointed to by *p*. If *pos* is negative or out of bounds, return "NULL" and set an "IndexError" exception. The returned reference is borrowed from the tuple *p* (that is: it is only valid as long as you hold a reference to *p*). To get a *strong reference*, use "Py_NewRef(PyTuple_GetItem(...))" or "PySequence_GetItem()". PyObject *PyTuple_GET_ITEM(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.* Like "PyTuple_GetItem()", but does no checking of its arguments. PyObject *PyTuple_GetSlice(PyObject *p, Py_ssize_t low, Py_ssize_t high) *Return value: New reference.** Part of the Stable ABI.* Return the slice of the tuple pointed to by *p* between *low* and *high*, or "NULL" with an exception set on failure. This is the equivalent of the Python expression "p[low:high]". Indexing from the end of the tuple is not supported. int PyTuple_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o) * Part of the Stable ABI.* Insert a reference to object *o* at position *pos* of the tuple pointed to by *p*. Return "0" on success. If *pos* is out of bounds, return "-1" and set an "IndexError" exception. Note: This function “steals” a reference to *o* and discards a reference to an item already in the tuple at the affected position. void PyTuple_SET_ITEM(PyObject *p, Py_ssize_t pos, PyObject *o) Like "PyTuple_SetItem()", but does no error checking, and should *only* be used to fill in brand new tuples. Bounds checking is performed as an assertion if Python is built in debug mode or "with assertions". Note: This function “steals” a reference to *o*, and, unlike "PyTuple_SetItem()", does *not* discard a reference to any item that is being replaced; any reference in the tuple at position *pos* will be leaked. int _PyTuple_Resize(PyObject **p, Py_ssize_t newsize) Can be used to resize a tuple. *newsize* will be the new length of the tuple. Because tuples are *supposed* to be immutable, this should only be used if there is only one reference to the object. Do *not* use this if the tuple may already be known to some other part of the code. The tuple will always grow or shrink at the end. Think of this as destroying the old tuple and creating a new one, only more efficiently. Returns "0" on success. Client code should never assume that the resulting value of "*p" will be the same as before calling this function. If the object referenced by "*p" is replaced, the original "*p" is destroyed. On failure, returns "-1" and sets "*p" to "NULL", and raises "MemoryError" or "SystemError". Struct Sequence Objects *********************** Struct sequence objects are the C equivalent of "namedtuple()" objects, i.e. a sequence whose items can also be accessed through attributes. To create a struct sequence, you first have to create a specific struct sequence type. PyTypeObject *PyStructSequence_NewType(PyStructSequence_Desc *desc) *Return value: New reference.** Part of the Stable ABI.* Create a new struct sequence type from the data in *desc*, described below. Instances of the resulting type can be created with "PyStructSequence_New()". Return "NULL" with an exception set on failure. void PyStructSequence_InitType(PyTypeObject *type, PyStructSequence_Desc *desc) Initializes a struct sequence type *type* from *desc* in place. int PyStructSequence_InitType2(PyTypeObject *type, PyStructSequence_Desc *desc) Like "PyStructSequence_InitType()", but returns "0" on success and "-1" with an exception set on failure. Added in version 3.4. type PyStructSequence_Desc * Part of the Stable ABI (including all members).* Contains the meta information of a struct sequence type to create. const char *name Fully qualified name of the type; null-terminated UTF-8 encoded. The name must contain the module name. const char *doc Pointer to docstring for the type or "NULL" to omit. PyStructSequence_Field *fields Pointer to "NULL"-terminated array with field names of the new type. int n_in_sequence Number of fields visible to the Python side (if used as tuple). type PyStructSequence_Field * Part of the Stable ABI (including all members).* Describes a field of a struct sequence. As a struct sequence is modeled as a tuple, all fields are typed as PyObject*. The index in the "fields" array of the "PyStructSequence_Desc" determines which field of the struct sequence is described. const char *name Name for the field or "NULL" to end the list of named fields, set to "PyStructSequence_UnnamedField" to leave unnamed. const char *doc Field docstring or "NULL" to omit. const char *const PyStructSequence_UnnamedField * Part of the Stable ABI since version 3.11.* Special value for a field name to leave it unnamed. Changed in version 3.9: The type was changed from "char *". PyObject *PyStructSequence_New(PyTypeObject *type) *Return value: New reference.** Part of the Stable ABI.* Creates an instance of *type*, which must have been created with "PyStructSequence_NewType()". Return "NULL" with an exception set on failure. PyObject *PyStructSequence_GetItem(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.** Part of the Stable ABI.* Return the object at position *pos* in the struct sequence pointed to by *p*. Bounds checking is performed as an assertion if Python is built in debug mode or "with assertions". PyObject *PyStructSequence_GET_ITEM(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.* Alias to "PyStructSequence_GetItem()". Changed in version 3.13: Now implemented as an alias to "PyStructSequence_GetItem()". void PyStructSequence_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o) * Part of the Stable ABI.* Sets the field at index *pos* of the struct sequence *p* to value *o*. Like "PyTuple_SET_ITEM()", this should only be used to fill in brand new instances. Bounds checking is performed as an assertion if Python is built in debug mode or "with assertions". Note: This function “steals” a reference to *o*. void PyStructSequence_SET_ITEM(PyObject *p, Py_ssize_t *pos, PyObject *o) Alias to "PyStructSequence_SetItem()". Changed in version 3.13: Now implemented as an alias to "PyStructSequence_SetItem()". Type Objects ************ type PyTypeObject * Part of the Limited API (as an opaque struct).* The C structure of the objects used to describe built-in types. PyTypeObject PyType_Type * Part of the Stable ABI.* This is the type object for type objects; it is the same object as "type" in the Python layer. int PyType_Check(PyObject *o) Return non-zero if the object *o* is a type object, including instances of types derived from the standard type object. Return 0 in all other cases. This function always succeeds. int PyType_CheckExact(PyObject *o) Return non-zero if the object *o* is a type object, but not a subtype of the standard type object. Return 0 in all other cases. This function always succeeds. unsigned int PyType_ClearCache() * Part of the Stable ABI.* Clear the internal lookup cache. Return the current version tag. unsigned long PyType_GetFlags(PyTypeObject *type) * Part of the Stable ABI.* Return the "tp_flags" member of *type*. This function is primarily meant for use with "Py_LIMITED_API"; the individual flag bits are guaranteed to be stable across Python releases, but access to "tp_flags" itself is not part of the limited API. Added in version 3.2. Changed in version 3.4: The return type is now "unsigned long" rather than "long". PyObject *PyType_GetDict(PyTypeObject *type) Return the type object’s internal namespace, which is otherwise only exposed via a read-only proxy ("cls.__dict__"). This is a replacement for accessing "tp_dict" directly. The returned dictionary must be treated as read-only. This function is meant for specific embedding and language-binding cases, where direct access to the dict is necessary and indirect access (e.g. via the proxy or "PyObject_GetAttr()") isn’t adequate. Extension modules should continue to use "tp_dict", directly or indirectly, when setting up their own types. Added in version 3.12. void PyType_Modified(PyTypeObject *type) * Part of the Stable ABI.* Invalidate the internal lookup cache for the type and all of its subtypes. This function must be called after any manual modification of the attributes or base classes of the type. int PyType_AddWatcher(PyType_WatchCallback callback) Register *callback* as a type watcher. Return a non-negative integer ID which must be passed to future calls to "PyType_Watch()". In case of error (e.g. no more watcher IDs available), return "-1" and set an exception. In free-threaded builds, "PyType_AddWatcher()" is not thread-safe, so it must be called at start up (before spawning the first thread). Added in version 3.12. int PyType_ClearWatcher(int watcher_id) Clear watcher identified by *watcher_id* (previously returned from "PyType_AddWatcher()"). Return "0" on success, "-1" on error (e.g. if *watcher_id* was never registered.) An extension should never call "PyType_ClearWatcher" with a *watcher_id* that was not returned to it by a previous call to "PyType_AddWatcher()". Added in version 3.12. int PyType_Watch(int watcher_id, PyObject *type) Mark *type* as watched. The callback granted *watcher_id* by "PyType_AddWatcher()" will be called whenever "PyType_Modified()" reports a change to *type*. (The callback may be called only once for a series of consecutive modifications to *type*, if "_PyType_Lookup()" is not called on *type* between the modifications; this is an implementation detail and subject to change.) An extension should never call "PyType_Watch" with a *watcher_id* that was not returned to it by a previous call to "PyType_AddWatcher()". Added in version 3.12. typedef int (*PyType_WatchCallback)(PyObject *type) Type of a type-watcher callback function. The callback must not modify *type* or cause "PyType_Modified()" to be called on *type* or any type in its MRO; violating this rule could cause infinite recursion. Added in version 3.12. int PyType_HasFeature(PyTypeObject *o, int feature) Return non-zero if the type object *o* sets the feature *feature*. Type features are denoted by single bit flags. int PyType_IS_GC(PyTypeObject *o) Return true if the type object includes support for the cycle detector; this tests the type flag "Py_TPFLAGS_HAVE_GC". int PyType_IsSubtype(PyTypeObject *a, PyTypeObject *b) * Part of the Stable ABI.* Return true if *a* is a subtype of *b*. This function only checks for actual subtypes, which means that "__subclasscheck__()" is not called on *b*. Call "PyObject_IsSubclass()" to do the same check that "issubclass()" would do. PyObject *PyType_GenericAlloc(PyTypeObject *type, Py_ssize_t nitems) *Return value: New reference.** Part of the Stable ABI.* Generic handler for the "tp_alloc" slot of a type object. Use Python’s default memory allocation mechanism to allocate a new instance and initialize all its contents to "NULL". PyObject *PyType_GenericNew(PyTypeObject *type, PyObject *args, PyObject *kwds) *Return value: New reference.** Part of the Stable ABI.* Generic handler for the "tp_new" slot of a type object. Create a new instance using the type’s "tp_alloc" slot. int PyType_Ready(PyTypeObject *type) * Part of the Stable ABI.* Finalize a type object. This should be called on all type objects to finish their initialization. This function is responsible for adding inherited slots from a type’s base class. Return "0" on success, or return "-1" and sets an exception on error. Note: If some of the base classes implements the GC protocol and the provided type does not include the "Py_TPFLAGS_HAVE_GC" in its flags, then the GC protocol will be automatically implemented from its parents. On the contrary, if the type being created does include "Py_TPFLAGS_HAVE_GC" in its flags then it **must** implement the GC protocol itself by at least implementing the "tp_traverse" handle. PyObject *PyType_GetName(PyTypeObject *type) *Return value: New reference.** Part of the Stable ABI since version 3.11.* Return the type’s name. Equivalent to getting the type’s "__name__" attribute. Added in version 3.11. PyObject *PyType_GetQualName(PyTypeObject *type) *Return value: New reference.** Part of the Stable ABI since version 3.11.* Return the type’s qualified name. Equivalent to getting the type’s "__qualname__" attribute. Added in version 3.11. PyObject *PyType_GetFullyQualifiedName(PyTypeObject *type) * Part of the Stable ABI since version 3.13.* Return the type’s fully qualified name. Equivalent to "f"{type.__module__}.{type.__qualname__}"", or "type.__qualname__" if "type.__module__" is not a string or is equal to ""builtins"". Added in version 3.13. PyObject *PyType_GetModuleName(PyTypeObject *type) * Part of the Stable ABI since version 3.13.* Return the type’s module name. Equivalent to getting the "type.__module__" attribute. Added in version 3.13. void *PyType_GetSlot(PyTypeObject *type, int slot) * Part of the Stable ABI since version 3.4.* Return the function pointer stored in the given slot. If the result is "NULL", this indicates that either the slot is "NULL", or that the function was called with invalid parameters. Callers will typically cast the result pointer into the appropriate function type. See "PyType_Slot.slot" for possible values of the *slot* argument. Added in version 3.4. Changed in version 3.10: "PyType_GetSlot()" can now accept all types. Previously, it was limited to heap types. PyObject *PyType_GetModule(PyTypeObject *type) * Part of the Stable ABI since version 3.10.* Return the module object associated with the given type when the type was created using "PyType_FromModuleAndSpec()". If no module is associated with the given type, sets "TypeError" and returns "NULL". This function is usually used to get the module in which a method is defined. Note that in such a method, "PyType_GetModule(Py_TYPE(self))" may not return the intended result. "Py_TYPE(self)" may be a *subclass* of the intended class, and subclasses are not necessarily defined in the same module as their superclass. See "PyCMethod" to get the class that defines the method. See "PyType_GetModuleByDef()" for cases when "PyCMethod" cannot be used. Added in version 3.9. void *PyType_GetModuleState(PyTypeObject *type) * Part of the Stable ABI since version 3.10.* Return the state of the module object associated with the given type. This is a shortcut for calling "PyModule_GetState()" on the result of "PyType_GetModule()". If no module is associated with the given type, sets "TypeError" and returns "NULL". If the *type* has an associated module but its state is "NULL", returns "NULL" without setting an exception. Added in version 3.9. PyObject *PyType_GetModuleByDef(PyTypeObject *type, struct PyModuleDef *def) * Part of the Stable ABI since version 3.13.* Find the first superclass whose module was created from the given "PyModuleDef" *def*, and return that module. If no module is found, raises a "TypeError" and returns "NULL". This function is intended to be used together with "PyModule_GetState()" to get module state from slot methods (such as "tp_init" or "nb_add") and other places where a method’s defining class cannot be passed using the "PyCMethod" calling convention. Added in version 3.11. int PyUnstable_Type_AssignVersionTag(PyTypeObject *type) *This is Unstable API. It may change without warning in minor releases.* Attempt to assign a version tag to the given type. Returns 1 if the type already had a valid version tag or a new one was assigned, or 0 if a new tag could not be assigned. Added in version 3.12. Creating Heap-Allocated Types ============================= The following functions and structs are used to create heap types. PyObject *PyType_FromMetaclass(PyTypeObject *metaclass, PyObject *module, PyType_Spec *spec, PyObject *bases) * Part of the Stable ABI since version 3.12.* Create and return a heap type from the *spec* (see "Py_TPFLAGS_HEAPTYPE"). The metaclass *metaclass* is used to construct the resulting type object. When *metaclass* is "NULL", the metaclass is derived from *bases* (or *Py_tp_base[s]* slots if *bases* is "NULL", see below). Metaclasses that override "tp_new" are not supported, except if "tp_new" is "NULL". (For backwards compatibility, other "PyType_From*" functions allow such metaclasses. They ignore "tp_new", which may result in incomplete initialization. This is deprecated and in Python 3.14+ such metaclasses will not be supported.) The *bases* argument can be used to specify base classes; it can either be only one class or a tuple of classes. If *bases* is "NULL", the *Py_tp_bases* slot is used instead. If that also is "NULL", the *Py_tp_base* slot is used instead. If that also is "NULL", the new type derives from "object". The *module* argument can be used to record the module in which the new class is defined. It must be a module object or "NULL". If not "NULL", the module is associated with the new type and can later be retrieved with "PyType_GetModule()". The associated module is not inherited by subclasses; it must be specified for each class individually. This function calls "PyType_Ready()" on the new type. Note that this function does *not* fully match the behavior of calling "type()" or using the "class" statement. With user-provided base types or metaclasses, prefer calling "type" (or the metaclass) over "PyType_From*" functions. Specifically: * "__new__()" is not called on the new class (and it must be set to "type.__new__"). * "__init__()" is not called on the new class. * "__init_subclass__()" is not called on any bases. * "__set_name__()" is not called on new descriptors. Added in version 3.12. PyObject *PyType_FromModuleAndSpec(PyObject *module, PyType_Spec *spec, PyObject *bases) *Return value: New reference.** Part of the Stable ABI since version 3.10.* Equivalent to "PyType_FromMetaclass(NULL, module, spec, bases)". Added in version 3.9. Changed in version 3.10: The function now accepts a single class as the *bases* argument and "NULL" as the "tp_doc" slot. Changed in version 3.12: The function now finds and uses a metaclass corresponding to the provided base classes. Previously, only "type" instances were returned.The "tp_new" of the metaclass is *ignored*. which may result in incomplete initialization. Creating classes whose metaclass overrides "tp_new" is deprecated and in Python 3.14+ it will be no longer allowed. PyObject *PyType_FromSpecWithBases(PyType_Spec *spec, PyObject *bases) *Return value: New reference.** Part of the Stable ABI since version 3.3.* Equivalent to "PyType_FromMetaclass(NULL, NULL, spec, bases)". Added in version 3.3. Changed in version 3.12: The function now finds and uses a metaclass corresponding to the provided base classes. Previously, only "type" instances were returned.The "tp_new" of the metaclass is *ignored*. which may result in incomplete initialization. Creating classes whose metaclass overrides "tp_new" is deprecated and in Python 3.14+ it will be no longer allowed. PyObject *PyType_FromSpec(PyType_Spec *spec) *Return value: New reference.** Part of the Stable ABI.* Equivalent to "PyType_FromMetaclass(NULL, NULL, spec, NULL)". Changed in version 3.12: The function now finds and uses a metaclass corresponding to the base classes provided in *Py_tp_base[s]* slots. Previously, only "type" instances were returned.The "tp_new" of the metaclass is *ignored*. which may result in incomplete initialization. Creating classes whose metaclass overrides "tp_new" is deprecated and in Python 3.14+ it will be no longer allowed. type PyType_Spec * Part of the Stable ABI (including all members).* Structure defining a type’s behavior. const char *name Name of the type, used to set "PyTypeObject.tp_name". int basicsize If positive, specifies the size of the instance in bytes. It is used to set "PyTypeObject.tp_basicsize". If zero, specifies that "tp_basicsize" should be inherited. If negative, the absolute value specifies how much space instances of the class need *in addition* to the superclass. Use "PyObject_GetTypeData()" to get a pointer to subclass-specific memory reserved this way. For negative "basicsize", Python will insert padding when needed to meet "tp_basicsize"’s alignment requirements. Changed in version 3.12: Previously, this field could not be negative. int itemsize Size of one element of a variable-size type, in bytes. Used to set "PyTypeObject.tp_itemsize". See "tp_itemsize" documentation for caveats. If zero, "tp_itemsize" is inherited. Extending arbitrary variable-sized classes is dangerous, since some types use a fixed offset for variable-sized memory, which can then overlap fixed-sized memory used by a subclass. To help prevent mistakes, inheriting "itemsize" is only possible in the following situations: * The base is not variable-sized (its "tp_itemsize"). * The requested "PyType_Spec.basicsize" is positive, suggesting that the memory layout of the base class is known. * The requested "PyType_Spec.basicsize" is zero, suggesting that the subclass does not access the instance’s memory directly. * With the "Py_TPFLAGS_ITEMS_AT_END" flag. unsigned int flags Type flags, used to set "PyTypeObject.tp_flags". If the "Py_TPFLAGS_HEAPTYPE" flag is not set, "PyType_FromSpecWithBases()" sets it automatically. PyType_Slot *slots Array of "PyType_Slot" structures. Terminated by the special slot value "{0, NULL}". Each slot ID should be specified at most once. type PyType_Slot * Part of the Stable ABI (including all members).* Structure defining optional functionality of a type, containing a slot ID and a value pointer. int slot A slot ID. Slot IDs are named like the field names of the structures "PyTypeObject", "PyNumberMethods", "PySequenceMethods", "PyMappingMethods" and "PyAsyncMethods" with an added "Py_" prefix. For example, use: * "Py_tp_dealloc" to set "PyTypeObject.tp_dealloc" * "Py_nb_add" to set "PyNumberMethods.nb_add" * "Py_sq_length" to set "PySequenceMethods.sq_length" The following “offset” fields cannot be set using "PyType_Slot": * "tp_weaklistoffset" (use "Py_TPFLAGS_MANAGED_WEAKREF" instead if possible) * "tp_dictoffset" (use "Py_TPFLAGS_MANAGED_DICT" instead if possible) * "tp_vectorcall_offset" (use ""__vectorcalloffset__"" in PyMemberDef) If it is not possible to switch to a "MANAGED" flag (for example, for vectorcall or to support Python older than 3.12), specify the offset in "Py_tp_members". See PyMemberDef documentation for details. The following fields cannot be set at all when creating a heap type: * "tp_vectorcall" (use "tp_new" and/or "tp_init") * Internal fields: "tp_dict", "tp_mro", "tp_cache", "tp_subclasses", and "tp_weaklist". Setting "Py_tp_bases" or "Py_tp_base" may be problematic on some platforms. To avoid issues, use the *bases* argument of "PyType_FromSpecWithBases()" instead. Changed in version 3.9: Slots in "PyBufferProcs" may be set in the unlimited API. Changed in version 3.11: "bf_getbuffer" and "bf_releasebuffer" are now available under the limited API. void *pfunc The desired value of the slot. In most cases, this is a pointer to a function. Slots other than "Py_tp_doc" may not be "NULL". Objects for Type Hinting ************************ Various built-in types for type hinting are provided. Currently, two types exist – GenericAlias and Union. Only "GenericAlias" is exposed to C. PyObject *Py_GenericAlias(PyObject *origin, PyObject *args) * Part of the Stable ABI since version 3.9.* Create a GenericAlias object. Equivalent to calling the Python class "types.GenericAlias". The *origin* and *args* arguments set the "GenericAlias"‘s "__origin__" and "__args__" attributes respectively. *origin* should be a PyTypeObject*, and *args* can be a PyTupleObject* or any "PyObject*". If *args* passed is not a tuple, a 1-tuple is automatically constructed and "__args__" is set to "(args,)". Minimal checking is done for the arguments, so the function will succeed even if *origin* is not a type. The "GenericAlias"‘s "__parameters__" attribute is constructed lazily from "__args__". On failure, an exception is raised and "NULL" is returned. Here’s an example of how to make an extension type generic: ... static PyMethodDef my_obj_methods[] = { // Other methods. ... {"__class_getitem__", Py_GenericAlias, METH_O|METH_CLASS, "See PEP 585"} ... } See also: The data model method "__class_getitem__()". Added in version 3.9. PyTypeObject Py_GenericAliasType * Part of the Stable ABI since version 3.9.* The C type of the object returned by "Py_GenericAlias()". Equivalent to "types.GenericAlias" in Python. Added in version 3.9. Type Object Structures ********************** Perhaps one of the most important structures of the Python object system is the structure that defines a new type: the "PyTypeObject" structure. Type objects can be handled using any of the "PyObject_*" or "PyType_*" functions, but do not offer much that’s interesting to most Python applications. These objects are fundamental to how objects behave, so they are very important to the interpreter itself and to any extension module that implements new types. Type objects are fairly large compared to most of the standard types. The reason for the size is that each type object stores a large number of values, mostly C function pointers, each of which implements a small part of the type’s functionality. The fields of the type object are examined in detail in this section. The fields will be described in the order in which they occur in the structure. In addition to the following quick reference, the Examples section provides at-a-glance insight into the meaning and use of "PyTypeObject". Quick Reference =============== “tp slots” ---------- +--------------------+--------------------+--------------------+----+----+----+----+ | PyTypeObject Slot | Type | special | Info [2] | | [1] | | methods/attrs | | | | | +----+----+----+----+ | | | | O | T | D | I | | | | | | | | | |====================|====================|====================|====|====|====|====| | "tp_name" | const char * | __name__ | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_basicsize" | "Py_ssize_t" | | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_itemsize" | "Py_ssize_t" | | | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_dealloc" | "destructor" | | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_vectorcall_of | "Py_ssize_t" | | | X | | X | | fset" | | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_getattr") | "getattrfunc" | __getattribute__, | | | | G | | | | __getattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_setattr") | "setattrfunc" | __setattr__, | | | | G | | | | __delattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_async" | "PyAsyncMethods" * | sub-slots | | | | % | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_repr" | "reprfunc" | __repr__ | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_number" | "PyNumberMethods" | sub-slots | | | | % | | | * | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_sequence" | "PySequenceMethod | sub-slots | | | | % | | | s" * | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_mapping" | "PyMappingMethods" | sub-slots | | | | % | | | * | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_hash" | "hashfunc" | __hash__ | X | | | G | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_call" | "ternaryfunc" | __call__ | | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_str" | "reprfunc" | __str__ | X | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_getattro" | "getattrofunc" | __getattribute__, | X | X | | G | | | | __getattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_setattro" | "setattrofunc" | __setattr__, | X | X | | G | | | | __delattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_buffer" | "PyBufferProcs" * | sub-slots | | | | % | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_flags" | unsigned long | | X | X | | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_doc" | const char * | __doc__ | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_traverse" | "traverseproc" | | | X | | G | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_clear" | "inquiry" | | | X | | G | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_richcompare" | "richcmpfunc" | __lt__, __le__, | X | | | G | | | | __eq__, __ne__, | | | | | | | | __gt__, __ge__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_weaklistoffs | "Py_ssize_t" | | | X | | ? | | et") | | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_iter" | "getiterfunc" | __iter__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_iternext" | "iternextfunc" | __next__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_methods" | "PyMethodDef" [] | | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_members" | "PyMemberDef" [] | | | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_getset" | "PyGetSetDef" [] | | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_base" | "PyTypeObject" * | __base__ | | | X | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_dict" | "PyObject" * | __dict__ | | | ? | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_descr_get" | "descrgetfunc" | __get__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_descr_set" | "descrsetfunc" | __set__, | | | | X | | | | __delete__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_dictoffset") | "Py_ssize_t" | | | X | | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_init" | "initproc" | __init__ | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_alloc" | "allocfunc" | | X | | ? | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_new" | "newfunc" | __new__ | X | X | ? | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_free" | "freefunc" | | X | X | ? | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_is_gc" | "inquiry" | | | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | <"tp_bases"> | "PyObject" * | __bases__ | | | ~ | | +--------------------+--------------------+--------------------+----+----+----+----+ | <"tp_mro"> | "PyObject" * | __mro__ | | | ~ | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_cache"] | "PyObject" * | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_subclasses"] | void * | __subclasses__ | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_weaklist"] | "PyObject" * | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_del") | "destructor" | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_version_tag"] | unsigned int | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_finalize" | "destructor" | __del__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_vectorcall" | "vectorcallfunc" | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_watched"] | unsigned char | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ [1] **()**: A slot name in parentheses indicates it is (effectively) deprecated. **<>**: Names in angle brackets should be initially set to "NULL" and treated as read-only. **[]**: Names in square brackets are for internal use only. **** (as a prefix) means the field is required (must be non-"NULL"). [2] Columns: **“O”**: set on "PyBaseObject_Type" **“T”**: set on "PyType_Type" **“D”**: default (if slot is set to "NULL") X - PyType_Ready sets this value if it is NULL ~ - PyType_Ready always sets this value (it should be NULL) ? - PyType_Ready may set this value depending on other slots Also see the inheritance column ("I"). **“I”**: inheritance X - type slot is inherited via *PyType_Ready* if defined with a *NULL* value % - the slots of the sub-struct are inherited individually G - inherited, but only in combination with other slots; see the slot's description ? - it's complicated; see the slot's description Note that some slots are effectively inherited through the normal attribute lookup chain. sub-slots --------- +----------------------------+-------------------+--------------+ | Slot | Type | special | | | | methods | |============================|===================|==============| | "am_await" | "unaryfunc" | __await__ | +----------------------------+-------------------+--------------+ | "am_aiter" | "unaryfunc" | __aiter__ | +----------------------------+-------------------+--------------+ | "am_anext" | "unaryfunc" | __anext__ | +----------------------------+-------------------+--------------+ | "am_send" | "sendfunc" | | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "nb_add" | "binaryfunc" | __add__ | | | | __radd__ | +----------------------------+-------------------+--------------+ | "nb_inplace_add" | "binaryfunc" | __iadd__ | +----------------------------+-------------------+--------------+ | "nb_subtract" | "binaryfunc" | __sub__ | | | | __rsub__ | +----------------------------+-------------------+--------------+ | "nb_inplace_subtract" | "binaryfunc" | __isub__ | +----------------------------+-------------------+--------------+ | "nb_multiply" | "binaryfunc" | __mul__ | | | | __rmul__ | +----------------------------+-------------------+--------------+ | "nb_inplace_multiply" | "binaryfunc" | __imul__ | +----------------------------+-------------------+--------------+ | "nb_remainder" | "binaryfunc" | __mod__ | | | | __rmod__ | +----------------------------+-------------------+--------------+ | "nb_inplace_remainder" | "binaryfunc" | __imod__ | +----------------------------+-------------------+--------------+ | "nb_divmod" | "binaryfunc" | __divmod__ | | | | __rdivmod__ | +----------------------------+-------------------+--------------+ | "nb_power" | "ternaryfunc" | __pow__ | | | | __rpow__ | +----------------------------+-------------------+--------------+ | "nb_inplace_power" | "ternaryfunc" | __ipow__ | +----------------------------+-------------------+--------------+ | "nb_negative" | "unaryfunc" | __neg__ | +----------------------------+-------------------+--------------+ | "nb_positive" | "unaryfunc" | __pos__ | +----------------------------+-------------------+--------------+ | "nb_absolute" | "unaryfunc" | __abs__ | +----------------------------+-------------------+--------------+ | "nb_bool" | "inquiry" | __bool__ | +----------------------------+-------------------+--------------+ | "nb_invert" | "unaryfunc" | __invert__ | +----------------------------+-------------------+--------------+ | "nb_lshift" | "binaryfunc" | __lshift__ | | | | __rlshift__ | +----------------------------+-------------------+--------------+ | "nb_inplace_lshift" | "binaryfunc" | __ilshift__ | +----------------------------+-------------------+--------------+ | "nb_rshift" | "binaryfunc" | __rshift__ | | | | __rrshift__ | +----------------------------+-------------------+--------------+ | "nb_inplace_rshift" | "binaryfunc" | __irshift__ | +----------------------------+-------------------+--------------+ | "nb_and" | "binaryfunc" | __and__ | | | | __rand__ | +----------------------------+-------------------+--------------+ | "nb_inplace_and" | "binaryfunc" | __iand__ | +----------------------------+-------------------+--------------+ | "nb_xor" | "binaryfunc" | __xor__ | | | | __rxor__ | +----------------------------+-------------------+--------------+ | "nb_inplace_xor" | "binaryfunc" | __ixor__ | +----------------------------+-------------------+--------------+ | "nb_or" | "binaryfunc" | __or__ | | | | __ror__ | +----------------------------+-------------------+--------------+ | "nb_inplace_or" | "binaryfunc" | __ior__ | +----------------------------+-------------------+--------------+ | "nb_int" | "unaryfunc" | __int__ | +----------------------------+-------------------+--------------+ | "nb_reserved" | void * | | +----------------------------+-------------------+--------------+ | "nb_float" | "unaryfunc" | __float__ | +----------------------------+-------------------+--------------+ | "nb_floor_divide" | "binaryfunc" | __floordiv__ | +----------------------------+-------------------+--------------+ | "nb_inplace_floor_divide" | "binaryfunc" | __ifloordiv | | | | __ | +----------------------------+-------------------+--------------+ | "nb_true_divide" | "binaryfunc" | __truediv__ | +----------------------------+-------------------+--------------+ | "nb_inplace_true_divide" | "binaryfunc" | __itruediv__ | +----------------------------+-------------------+--------------+ | "nb_index" | "unaryfunc" | __index__ | +----------------------------+-------------------+--------------+ | "nb_matrix_multiply" | "binaryfunc" | __matmul__ | | | | __rmatmul__ | +----------------------------+-------------------+--------------+ | "nb_inplace_matrix_multip | "binaryfunc" | __imatmul__ | | ly" | | | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "mp_length" | "lenfunc" | __len__ | +----------------------------+-------------------+--------------+ | "mp_subscript" | "binaryfunc" | __getitem__ | +----------------------------+-------------------+--------------+ | "mp_ass_subscript" | "objobjargproc" | __setitem__, | | | | __delitem__ | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "sq_length" | "lenfunc" | __len__ | +----------------------------+-------------------+--------------+ | "sq_concat" | "binaryfunc" | __add__ | +----------------------------+-------------------+--------------+ | "sq_repeat" | "ssizeargfunc" | __mul__ | +----------------------------+-------------------+--------------+ | "sq_item" | "ssizeargfunc" | __getitem__ | +----------------------------+-------------------+--------------+ | "sq_ass_item" | "ssizeobjargproc" | __setitem__ | | | | __delitem__ | +----------------------------+-------------------+--------------+ | "sq_contains" | "objobjproc" | __contains__ | +----------------------------+-------------------+--------------+ | "sq_inplace_concat" | "binaryfunc" | __iadd__ | +----------------------------+-------------------+--------------+ | "sq_inplace_repeat" | "ssizeargfunc" | __imul__ | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "bf_getbuffer" | "getbufferproc()" | __buffer__ | +----------------------------+-------------------+--------------+ | "bf_releasebuffer" | "releasebufferpr | __release_b | | | oc()" | uffer__ | +----------------------------+-------------------+--------------+ slot typedefs ------------- +-------------------------------+-------------------------------+------------------------+ | typedef | Parameter Types | Return Type | |===============================|===============================|========================| | "allocfunc" | "PyTypeObject" * "Py_ssize_t" | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "destructor" | "PyObject" * | void | +-------------------------------+-------------------------------+------------------------+ | "freefunc" | void * | void | +-------------------------------+-------------------------------+------------------------+ | "traverseproc" | "PyObject" * "visitproc" void | int | | | * | | +-------------------------------+-------------------------------+------------------------+ | "newfunc" | "PyTypeObject" * "PyObject" * | "PyObject" * | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "initproc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "reprfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "getattrfunc" | "PyObject" * const char * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "setattrfunc" | "PyObject" * const char * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "getattrofunc" | "PyObject" * "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "setattrofunc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "descrgetfunc" | "PyObject" * "PyObject" * | "PyObject" * | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "descrsetfunc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "hashfunc" | "PyObject" * | Py_hash_t | +-------------------------------+-------------------------------+------------------------+ | "richcmpfunc" | "PyObject" * "PyObject" * int | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "getiterfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "iternextfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "lenfunc" | "PyObject" * | "Py_ssize_t" | +-------------------------------+-------------------------------+------------------------+ | "getbufferproc" | "PyObject" * "Py_buffer" * | int | | | int | | +-------------------------------+-------------------------------+------------------------+ | "releasebufferproc" | "PyObject" * "Py_buffer" * | void | +-------------------------------+-------------------------------+------------------------+ | "inquiry" | "PyObject" * | int | +-------------------------------+-------------------------------+------------------------+ | "unaryfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "binaryfunc" | "PyObject" * "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "ternaryfunc" | "PyObject" * "PyObject" * | "PyObject" * | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "ssizeargfunc" | "PyObject" * "Py_ssize_t" | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "ssizeobjargproc" | "PyObject" * "Py_ssize_t" | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "objobjproc" | "PyObject" * "PyObject" * | int | +-------------------------------+-------------------------------+------------------------+ | "objobjargproc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ See Slot Type typedefs below for more detail. PyTypeObject Definition ======================= The structure definition for "PyTypeObject" can be found in "Include/cpython/object.h". For convenience of reference, this repeats the definition found there: typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "." */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; Py_ssize_t tp_vectorcall_offset; getattrfunc tp_getattr; setattrfunc tp_setattr; PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2) or tp_reserved (Python 3) */ reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ unsigned long tp_flags; const char *tp_doc; /* Documentation string */ /* Assigned meaning in release 2.0 */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* Assigned meaning in release 2.1 */ /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; // Strong reference on a heap type, borrowed reference on a static type struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; destructor tp_finalize; vectorcallfunc tp_vectorcall; /* bitset of which type-watchers care about this type */ unsigned char tp_watched; } PyTypeObject; PyObject Slots ============== The type object structure extends the "PyVarObject" structure. The "ob_size" field is used for dynamic types (created by "type_new()", usually called from a class statement). Note that "PyType_Type" (the metatype) initializes "tp_itemsize", which means that its instances (i.e. type objects) *must* have the "ob_size" field. Py_ssize_t PyObject.ob_refcnt * Part of the Stable ABI.* This is the type object’s reference count, initialized to "1" by the "PyObject_HEAD_INIT" macro. Note that for statically allocated type objects, the type’s instances (objects whose "ob_type" points back to the type) do *not* count as references. But for dynamically allocated type objects, the instances *do* count as references. **Inheritance:** This field is not inherited by subtypes. PyTypeObject *PyObject.ob_type * Part of the Stable ABI.* This is the type’s type, in other words its metatype. It is initialized by the argument to the "PyObject_HEAD_INIT" macro, and its value should normally be "&PyType_Type". However, for dynamically loadable extension modules that must be usable on Windows (at least), the compiler complains that this is not a valid initializer. Therefore, the convention is to pass "NULL" to the "PyObject_HEAD_INIT" macro and to initialize this field explicitly at the start of the module’s initialization function, before doing anything else. This is typically done like this: Foo_Type.ob_type = &PyType_Type; This should be done before any instances of the type are created. "PyType_Ready()" checks if "ob_type" is "NULL", and if so, initializes it to the "ob_type" field of the base class. "PyType_Ready()" will not change this field if it is non-zero. **Inheritance:** This field is inherited by subtypes. PyVarObject Slots ================= Py_ssize_t PyVarObject.ob_size * Part of the Stable ABI.* For statically allocated type objects, this should be initialized to zero. For dynamically allocated type objects, this field has a special internal meaning. This field should be accessed using the "Py_SIZE()" and "Py_SET_SIZE()" macros. **Inheritance:** This field is not inherited by subtypes. PyTypeObject Slots ================== Each slot has a section describing inheritance. If "PyType_Ready()" may set a value when the field is set to "NULL" then there will also be a “Default” section. (Note that many fields set on "PyBaseObject_Type" and "PyType_Type" effectively act as defaults.) const char *PyTypeObject.tp_name Pointer to a NUL-terminated string containing the name of the type. For types that are accessible as module globals, the string should be the full module name, followed by a dot, followed by the type name; for built-in types, it should be just the type name. If the module is a submodule of a package, the full package name is part of the full module name. For example, a type named "T" defined in module "M" in subpackage "Q" in package "P" should have the "tp_name" initializer ""P.Q.M.T"". For dynamically allocated type objects, this should just be the type name, and the module name explicitly stored in the type dict as the value for key "'__module__'". For statically allocated type objects, the *tp_name* field should contain a dot. Everything before the last dot is made accessible as the "__module__" attribute, and everything after the last dot is made accessible as the "__name__" attribute. If no dot is present, the entire "tp_name" field is made accessible as the "__name__" attribute, and the "__module__" attribute is undefined (unless explicitly set in the dictionary, as explained above). This means your type will be impossible to pickle. Additionally, it will not be listed in module documentations created with pydoc. This field must not be "NULL". It is the only required field in "PyTypeObject()" (other than potentially "tp_itemsize"). **Inheritance:** This field is not inherited by subtypes. Py_ssize_t PyTypeObject.tp_basicsize Py_ssize_t PyTypeObject.tp_itemsize These fields allow calculating the size in bytes of instances of the type. There are two kinds of types: types with fixed-length instances have a zero "tp_itemsize" field, types with variable-length instances have a non-zero "tp_itemsize" field. For a type with fixed-length instances, all instances have the same size, given in "tp_basicsize". (Exceptions to this rule can be made using "PyUnstable_Object_GC_NewWithExtraData()".) For a type with variable-length instances, the instances must have an "ob_size" field, and the instance size is "tp_basicsize" plus N times "tp_itemsize", where N is the “length” of the object. Functions like "PyObject_NewVar()" will take the value of N as an argument, and store in the instance’s "ob_size" field. Note that the "ob_size" field may later be used for other purposes. For example, "int" instances use the bits of "ob_size" in an implementation-defined way; the underlying storage and its size should be accessed using "PyLong_Export()". Note: The "ob_size" field should be accessed using the "Py_SIZE()" and "Py_SET_SIZE()" macros. Also, the presence of an "ob_size" field in the instance layout doesn’t mean that the instance structure is variable-length. For example, the "list" type has fixed-length instances, yet those instances have a "ob_size" field. (As with "int", avoid reading lists’ "ob_size" directly. Call "PyList_Size()" instead.) The "tp_basicsize" includes size needed for data of the type’s "tp_base", plus any extra data needed by each instance. The correct way to set "tp_basicsize" is to use the "sizeof" operator on the struct used to declare the instance layout. This struct must include the struct used to declare the base type. In other words, "tp_basicsize" must be greater than or equal to the base’s "tp_basicsize". Since every type is a subtype of "object", this struct must include "PyObject" or "PyVarObject" (depending on whether "ob_size" should be included). These are usually defined by the macro "PyObject_HEAD" or "PyObject_VAR_HEAD", respectively. The basic size does not include the GC header size, as that header is not part of "PyObject_HEAD". For cases where struct used to declare the base type is unknown, see "PyType_Spec.basicsize" and "PyType_FromMetaclass()". Notes about alignment: * "tp_basicsize" must be a multiple of "_Alignof(PyObject)". When using "sizeof" on a "struct" that includes "PyObject_HEAD", as recommended, the compiler ensures this. When not using a C "struct", or when using compiler extensions like "__attribute__((packed))", it is up to you. * If the variable items require a particular alignment, "tp_basicsize" and "tp_itemsize" must each be a multiple of that alignment. For example, if a type’s variable part stores a "double", it is your responsibility that both fields are a multiple of "_Alignof(double)". **Inheritance:** These fields are inherited separately by subtypes. (That is, if the field is set to zero, "PyType_Ready()" will copy the value from the base type, indicating that the instances do not need additional storage.) If the base type has a non-zero "tp_itemsize", it is generally not safe to set "tp_itemsize" to a different non-zero value in a subtype (though this depends on the implementation of the base type). destructor PyTypeObject.tp_dealloc A pointer to the instance destructor function. This function must be defined unless the type guarantees that its instances will never be deallocated (as is the case for the singletons "None" and "Ellipsis"). The function signature is: void tp_dealloc(PyObject *self); The destructor function is called by the "Py_DECREF()" and "Py_XDECREF()" macros when the new reference count is zero. At this point, the instance is still in existence, but there are no references to it. The destructor function should free all references which the instance owns, free all memory buffers owned by the instance (using the freeing function corresponding to the allocation function used to allocate the buffer), and call the type’s "tp_free" function. If the type is not subtypable (doesn’t have the "Py_TPFLAGS_BASETYPE" flag bit set), it is permissible to call the object deallocator directly instead of via "tp_free". The object deallocator should be the one used to allocate the instance; this is normally "PyObject_Del()" if the instance was allocated using "PyObject_New" or "PyObject_NewVar", or "PyObject_GC_Del()" if the instance was allocated using "PyObject_GC_New" or "PyObject_GC_NewVar". If the type supports garbage collection (has the "Py_TPFLAGS_HAVE_GC" flag bit set), the destructor should call "PyObject_GC_UnTrack()" before clearing any member fields. static void foo_dealloc(foo_object *self) { PyObject_GC_UnTrack(self); Py_CLEAR(self->ref); Py_TYPE(self)->tp_free((PyObject *)self); } Finally, if the type is heap allocated ("Py_TPFLAGS_HEAPTYPE"), the deallocator should release the owned reference to its type object (via "Py_DECREF()") after calling the type deallocator. In order to avoid dangling pointers, the recommended way to achieve this is: static void foo_dealloc(foo_object *self) { PyTypeObject *tp = Py_TYPE(self); // free references and buffers here tp->tp_free(self); Py_DECREF(tp); } Warning: In a garbage collected Python, "tp_dealloc" may be called from any Python thread, not just the thread which created the object (if the object becomes part of a refcount cycle, that cycle might be collected by a garbage collection on any thread). This is not a problem for Python API calls, since the thread on which "tp_dealloc" is called will own the Global Interpreter Lock (GIL). However, if the object being destroyed in turn destroys objects from some other C or C++ library, care should be taken to ensure that destroying those objects on the thread which called "tp_dealloc" will not violate any assumptions of the library. **Inheritance:** This field is inherited by subtypes. Py_ssize_t PyTypeObject.tp_vectorcall_offset An optional offset to a per-instance function that implements calling the object using the vectorcall protocol, a more efficient alternative of the simpler "tp_call". This field is only used if the flag "Py_TPFLAGS_HAVE_VECTORCALL" is set. If so, this must be a positive integer containing the offset in the instance of a "vectorcallfunc" pointer. The *vectorcallfunc* pointer may be "NULL", in which case the instance behaves as if "Py_TPFLAGS_HAVE_VECTORCALL" was not set: calling the instance falls back to "tp_call". Any class that sets "Py_TPFLAGS_HAVE_VECTORCALL" must also set "tp_call" and make sure its behaviour is consistent with the *vectorcallfunc* function. This can be done by setting *tp_call* to "PyVectorcall_Call()". Changed in version 3.8: Before version 3.8, this slot was named "tp_print". In Python 2.x, it was used for printing to a file. In Python 3.0 to 3.7, it was unused. Changed in version 3.12: Before version 3.12, it was not recommended for mutable heap types to implement the vectorcall protocol. When a user sets "__call__" in Python code, only *tp_call* is updated, likely making it inconsistent with the vectorcall function. Since 3.12, setting "__call__" will disable vectorcall optimization by clearing the "Py_TPFLAGS_HAVE_VECTORCALL" flag. **Inheritance:** This field is always inherited. However, the "Py_TPFLAGS_HAVE_VECTORCALL" flag is not always inherited. If it’s not set, then the subclass won’t use vectorcall, except when "PyVectorcall_Call()" is explicitly called. getattrfunc PyTypeObject.tp_getattr An optional pointer to the get-attribute-string function. This field is deprecated. When it is defined, it should point to a function that acts the same as the "tp_getattro" function, but taking a C string instead of a Python string object to give the attribute name. **Inheritance:** Group: "tp_getattr", "tp_getattro" This field is inherited by subtypes together with "tp_getattro": a subtype inherits both "tp_getattr" and "tp_getattro" from its base type when the subtype’s "tp_getattr" and "tp_getattro" are both "NULL". setattrfunc PyTypeObject.tp_setattr An optional pointer to the function for setting and deleting attributes. This field is deprecated. When it is defined, it should point to a function that acts the same as the "tp_setattro" function, but taking a C string instead of a Python string object to give the attribute name. **Inheritance:** Group: "tp_setattr", "tp_setattro" This field is inherited by subtypes together with "tp_setattro": a subtype inherits both "tp_setattr" and "tp_setattro" from its base type when the subtype’s "tp_setattr" and "tp_setattro" are both "NULL". PyAsyncMethods *PyTypeObject.tp_as_async Pointer to an additional structure that contains fields relevant only to objects which implement *awaitable* and *asynchronous iterator* protocols at the C-level. See Async Object Structures for details. Added in version 3.5: Formerly known as "tp_compare" and "tp_reserved". **Inheritance:** The "tp_as_async" field is not inherited, but the contained fields are inherited individually. reprfunc PyTypeObject.tp_repr An optional pointer to a function that implements the built-in function "repr()". The signature is the same as for "PyObject_Repr()": PyObject *tp_repr(PyObject *self); The function must return a string or a Unicode object. Ideally, this function should return a string that, when passed to "eval()", given a suitable environment, returns an object with the same value. If this is not feasible, it should return a string starting with "'<'" and ending with "'>'" from which both the type and the value of the object can be deduced. **Inheritance:** This field is inherited by subtypes. **Default:** When this field is not set, a string of the form "<%s object at %p>" is returned, where "%s" is replaced by the type name, and "%p" by the object’s memory address. PyNumberMethods *PyTypeObject.tp_as_number Pointer to an additional structure that contains fields relevant only to objects which implement the number protocol. These fields are documented in Number Object Structures. **Inheritance:** The "tp_as_number" field is not inherited, but the contained fields are inherited individually. PySequenceMethods *PyTypeObject.tp_as_sequence Pointer to an additional structure that contains fields relevant only to objects which implement the sequence protocol. These fields are documented in Sequence Object Structures. **Inheritance:** The "tp_as_sequence" field is not inherited, but the contained fields are inherited individually. PyMappingMethods *PyTypeObject.tp_as_mapping Pointer to an additional structure that contains fields relevant only to objects which implement the mapping protocol. These fields are documented in Mapping Object Structures. **Inheritance:** The "tp_as_mapping" field is not inherited, but the contained fields are inherited individually. hashfunc PyTypeObject.tp_hash An optional pointer to a function that implements the built-in function "hash()". The signature is the same as for "PyObject_Hash()": Py_hash_t tp_hash(PyObject *); The value "-1" should not be returned as a normal return value; when an error occurs during the computation of the hash value, the function should set an exception and return "-1". When this field is not set (*and* "tp_richcompare" is not set), an attempt to take the hash of the object raises "TypeError". This is the same as setting it to "PyObject_HashNotImplemented()". This field can be set explicitly to "PyObject_HashNotImplemented()" to block inheritance of the hash method from a parent type. This is interpreted as the equivalent of "__hash__ = None" at the Python level, causing "isinstance(o, collections.Hashable)" to correctly return "False". Note that the converse is also true - setting "__hash__ = None" on a class at the Python level will result in the "tp_hash" slot being set to "PyObject_HashNotImplemented()". **Inheritance:** Group: "tp_hash", "tp_richcompare" This field is inherited by subtypes together with "tp_richcompare": a subtype inherits both of "tp_richcompare" and "tp_hash", when the subtype’s "tp_richcompare" and "tp_hash" are both "NULL". **Default:** "PyBaseObject_Type" uses "PyObject_GenericHash()". ternaryfunc PyTypeObject.tp_call An optional pointer to a function that implements calling the object. This should be "NULL" if the object is not callable. The signature is the same as for "PyObject_Call()": PyObject *tp_call(PyObject *self, PyObject *args, PyObject *kwargs); **Inheritance:** This field is inherited by subtypes. reprfunc PyTypeObject.tp_str An optional pointer to a function that implements the built-in operation "str()". (Note that "str" is a type now, and "str()" calls the constructor for that type. This constructor calls "PyObject_Str()" to do the actual work, and "PyObject_Str()" will call this handler.) The signature is the same as for "PyObject_Str()": PyObject *tp_str(PyObject *self); The function must return a string or a Unicode object. It should be a “friendly” string representation of the object, as this is the representation that will be used, among other things, by the "print()" function. **Inheritance:** This field is inherited by subtypes. **Default:** When this field is not set, "PyObject_Repr()" is called to return a string representation. getattrofunc PyTypeObject.tp_getattro An optional pointer to the get-attribute function. The signature is the same as for "PyObject_GetAttr()": PyObject *tp_getattro(PyObject *self, PyObject *attr); It is usually convenient to set this field to "PyObject_GenericGetAttr()", which implements the normal way of looking for object attributes. **Inheritance:** Group: "tp_getattr", "tp_getattro" This field is inherited by subtypes together with "tp_getattr": a subtype inherits both "tp_getattr" and "tp_getattro" from its base type when the subtype’s "tp_getattr" and "tp_getattro" are both "NULL". **Default:** "PyBaseObject_Type" uses "PyObject_GenericGetAttr()". setattrofunc PyTypeObject.tp_setattro An optional pointer to the function for setting and deleting attributes. The signature is the same as for "PyObject_SetAttr()": int tp_setattro(PyObject *self, PyObject *attr, PyObject *value); In addition, setting *value* to "NULL" to delete an attribute must be supported. It is usually convenient to set this field to "PyObject_GenericSetAttr()", which implements the normal way of setting object attributes. **Inheritance:** Group: "tp_setattr", "tp_setattro" This field is inherited by subtypes together with "tp_setattr": a subtype inherits both "tp_setattr" and "tp_setattro" from its base type when the subtype’s "tp_setattr" and "tp_setattro" are both "NULL". **Default:** "PyBaseObject_Type" uses "PyObject_GenericSetAttr()". PyBufferProcs *PyTypeObject.tp_as_buffer Pointer to an additional structure that contains fields relevant only to objects which implement the buffer interface. These fields are documented in Buffer Object Structures. **Inheritance:** The "tp_as_buffer" field is not inherited, but the contained fields are inherited individually. unsigned long PyTypeObject.tp_flags This field is a bit mask of various flags. Some flags indicate variant semantics for certain situations; others are used to indicate that certain fields in the type object (or in the extension structures referenced via "tp_as_number", "tp_as_sequence", "tp_as_mapping", and "tp_as_buffer") that were historically not always present are valid; if such a flag bit is clear, the type fields it guards must not be accessed and must be considered to have a zero or "NULL" value instead. **Inheritance:** Inheritance of this field is complicated. Most flag bits are inherited individually, i.e. if the base type has a flag bit set, the subtype inherits this flag bit. The flag bits that pertain to extension structures are strictly inherited if the extension structure is inherited, i.e. the base type’s value of the flag bit is copied into the subtype together with a pointer to the extension structure. The "Py_TPFLAGS_HAVE_GC" flag bit is inherited together with the "tp_traverse" and "tp_clear" fields, i.e. if the "Py_TPFLAGS_HAVE_GC" flag bit is clear in the subtype and the "tp_traverse" and "tp_clear" fields in the subtype exist and have "NULL" values. .. XXX are most flag bits *really* inherited individually? **Default:** "PyBaseObject_Type" uses "Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE". **Bit Masks:** The following bit masks are currently defined; these can be ORed together using the "|" operator to form the value of the "tp_flags" field. The macro "PyType_HasFeature()" takes a type and a flags value, *tp* and *f*, and checks whether "tp->tp_flags & f" is non- zero. Py_TPFLAGS_HEAPTYPE This bit is set when the type object itself is allocated on the heap, for example, types created dynamically using "PyType_FromSpec()". In this case, the "ob_type" field of its instances is considered a reference to the type, and the type object is INCREF’ed when a new instance is created, and DECREF’ed when an instance is destroyed (this does not apply to instances of subtypes; only the type referenced by the instance’s ob_type gets INCREF’ed or DECREF’ed). Heap types should also support garbage collection as they can form a reference cycle with their own module object. **Inheritance:** ??? Py_TPFLAGS_BASETYPE This bit is set when the type can be used as the base type of another type. If this bit is clear, the type cannot be subtyped (similar to a “final” class in Java). **Inheritance:** ??? Py_TPFLAGS_READY This bit is set when the type object has been fully initialized by "PyType_Ready()". **Inheritance:** ??? Py_TPFLAGS_READYING This bit is set while "PyType_Ready()" is in the process of initializing the type object. **Inheritance:** ??? Py_TPFLAGS_HAVE_GC This bit is set when the object supports garbage collection. If this bit is set, instances must be created using "PyObject_GC_New" and destroyed using "PyObject_GC_Del()". More information in section Supporting Cyclic Garbage Collection. This bit also implies that the GC-related fields "tp_traverse" and "tp_clear" are present in the type object. **Inheritance:** Group: "Py_TPFLAGS_HAVE_GC", "tp_traverse", "tp_clear" The "Py_TPFLAGS_HAVE_GC" flag bit is inherited together with the "tp_traverse" and "tp_clear" fields, i.e. if the "Py_TPFLAGS_HAVE_GC" flag bit is clear in the subtype and the "tp_traverse" and "tp_clear" fields in the subtype exist and have "NULL" values. Py_TPFLAGS_DEFAULT This is a bitmask of all the bits that pertain to the existence of certain fields in the type object and its extension structures. Currently, it includes the following bits: "Py_TPFLAGS_HAVE_STACKLESS_EXTENSION". **Inheritance:** ??? Py_TPFLAGS_METHOD_DESCRIPTOR This bit indicates that objects behave like unbound methods. If this flag is set for "type(meth)", then: * "meth.__get__(obj, cls)(*args, **kwds)" (with "obj" not None) must be equivalent to "meth(obj, *args, **kwds)". * "meth.__get__(None, cls)(*args, **kwds)" must be equivalent to "meth(*args, **kwds)". This flag enables an optimization for typical method calls like "obj.meth()": it avoids creating a temporary “bound method” object for "obj.meth". Added in version 3.8. **Inheritance:** This flag is never inherited by types without the "Py_TPFLAGS_IMMUTABLETYPE" flag set. For extension types, it is inherited whenever "tp_descr_get" is inherited. Py_TPFLAGS_MANAGED_DICT This bit indicates that instances of the class have a "__dict__" attribute, and that the space for the dictionary is managed by the VM. If this flag is set, "Py_TPFLAGS_HAVE_GC" should also be set. The type traverse function must call "PyObject_VisitManagedDict()" and its clear function must call "PyObject_ClearManagedDict()". Added in version 3.12. **Inheritance:** This flag is inherited unless the "tp_dictoffset" field is set in a superclass. Py_TPFLAGS_MANAGED_WEAKREF This bit indicates that instances of the class should be weakly referenceable. Added in version 3.12. **Inheritance:** This flag is inherited unless the "tp_weaklistoffset" field is set in a superclass. Py_TPFLAGS_ITEMS_AT_END Only usable with variable-size types, i.e. ones with non-zero "tp_itemsize". Indicates that the variable-sized portion of an instance of this type is at the end of the instance’s memory area, at an offset of "Py_TYPE(obj)->tp_basicsize" (which may be different in each subclass). When setting this flag, be sure that all superclasses either use this memory layout, or are not variable-sized. Python does not check this. Added in version 3.12. **Inheritance:** This flag is inherited. Py_TPFLAGS_LONG_SUBCLASS Py_TPFLAGS_LIST_SUBCLASS Py_TPFLAGS_TUPLE_SUBCLASS Py_TPFLAGS_BYTES_SUBCLASS Py_TPFLAGS_UNICODE_SUBCLASS Py_TPFLAGS_DICT_SUBCLASS Py_TPFLAGS_BASE_EXC_SUBCLASS Py_TPFLAGS_TYPE_SUBCLASS These flags are used by functions such as "PyLong_Check()" to quickly determine if a type is a subclass of a built-in type; such specific checks are faster than a generic check, like "PyObject_IsInstance()". Custom types that inherit from built- ins should have their "tp_flags" set appropriately, or the code that interacts with such types will behave differently depending on what kind of check is used. Py_TPFLAGS_HAVE_FINALIZE This bit is set when the "tp_finalize" slot is present in the type structure. Added in version 3.4. Deprecated since version 3.8: This flag isn’t necessary anymore, as the interpreter assumes the "tp_finalize" slot is always present in the type structure. Py_TPFLAGS_HAVE_VECTORCALL This bit is set when the class implements the vectorcall protocol. See "tp_vectorcall_offset" for details. **Inheritance:** This bit is inherited if "tp_call" is also inherited. Added in version 3.9. Changed in version 3.12: This flag is now removed from a class when the class’s "__call__()" method is reassigned.This flag can now be inherited by mutable classes. Py_TPFLAGS_IMMUTABLETYPE This bit is set for type objects that are immutable: type attributes cannot be set nor deleted. "PyType_Ready()" automatically applies this flag to static types. **Inheritance:** This flag is not inherited. Added in version 3.10. Py_TPFLAGS_DISALLOW_INSTANTIATION Disallow creating instances of the type: set "tp_new" to NULL and don’t create the "__new__" key in the type dictionary. The flag must be set before creating the type, not after. For example, it must be set before "PyType_Ready()" is called on the type. The flag is set automatically on static types if "tp_base" is NULL or "&PyBaseObject_Type" and "tp_new" is NULL. **Inheritance:** This flag is not inherited. However, subclasses will not be instantiable unless they provide a non-NULL "tp_new" (which is only possible via the C API). Note: To disallow instantiating a class directly but allow instantiating its subclasses (e.g. for an *abstract base class*), do not use this flag. Instead, make "tp_new" only succeed for subclasses. Added in version 3.10. Py_TPFLAGS_MAPPING This bit indicates that instances of the class may match mapping patterns when used as the subject of a "match" block. It is automatically set when registering or subclassing "collections.abc.Mapping", and unset when registering "collections.abc.Sequence". Note: "Py_TPFLAGS_MAPPING" and "Py_TPFLAGS_SEQUENCE" are mutually exclusive; it is an error to enable both flags simultaneously. **Inheritance:** This flag is inherited by types that do not already set "Py_TPFLAGS_SEQUENCE". See also: **PEP 634** – Structural Pattern Matching: Specification Added in version 3.10. Py_TPFLAGS_SEQUENCE This bit indicates that instances of the class may match sequence patterns when used as the subject of a "match" block. It is automatically set when registering or subclassing "collections.abc.Sequence", and unset when registering "collections.abc.Mapping". Note: "Py_TPFLAGS_MAPPING" and "Py_TPFLAGS_SEQUENCE" are mutually exclusive; it is an error to enable both flags simultaneously. **Inheritance:** This flag is inherited by types that do not already set "Py_TPFLAGS_MAPPING". See also: **PEP 634** – Structural Pattern Matching: Specification Added in version 3.10. Py_TPFLAGS_VALID_VERSION_TAG Internal. Do not set or unset this flag. To indicate that a class has changed call "PyType_Modified()" Warning: This flag is present in header files, but is not be used. It will be removed in a future version of CPython const char *PyTypeObject.tp_doc An optional pointer to a NUL-terminated C string giving the docstring for this type object. This is exposed as the "__doc__" attribute on the type and instances of the type. **Inheritance:** This field is *not* inherited by subtypes. traverseproc PyTypeObject.tp_traverse An optional pointer to a traversal function for the garbage collector. This is only used if the "Py_TPFLAGS_HAVE_GC" flag bit is set. The signature is: int tp_traverse(PyObject *self, visitproc visit, void *arg); More information about Python’s garbage collection scheme can be found in section Supporting Cyclic Garbage Collection. The "tp_traverse" pointer is used by the garbage collector to detect reference cycles. A typical implementation of a "tp_traverse" function simply calls "Py_VISIT()" on each of the instance’s members that are Python objects that the instance owns. For example, this is function "local_traverse()" from the "_thread" extension module: static int local_traverse(localobject *self, visitproc visit, void *arg) { Py_VISIT(self->args); Py_VISIT(self->kw); Py_VISIT(self->dict); return 0; } Note that "Py_VISIT()" is called only on those members that can participate in reference cycles. Although there is also a "self->key" member, it can only be "NULL" or a Python string and therefore cannot be part of a reference cycle. On the other hand, even if you know a member can never be part of a cycle, as a debugging aid you may want to visit it anyway just so the "gc" module’s "get_referents()" function will include it. Heap types ("Py_TPFLAGS_HEAPTYPE") must visit their type with: Py_VISIT(Py_TYPE(self)); It is only needed since Python 3.9. To support Python 3.8 and older, this line must be conditional: #if PY_VERSION_HEX >= 0x03090000 Py_VISIT(Py_TYPE(self)); #endif If the "Py_TPFLAGS_MANAGED_DICT" bit is set in the "tp_flags" field, the traverse function must call "PyObject_VisitManagedDict()" like this: PyObject_VisitManagedDict((PyObject*)self, visit, arg); Warning: When implementing "tp_traverse", only the members that the instance *owns* (by having *strong references* to them) must be visited. For instance, if an object supports weak references via the "tp_weaklist" slot, the pointer supporting the linked list (what *tp_weaklist* points to) must **not** be visited as the instance does not directly own the weak references to itself (the weakreference list is there to support the weak reference machinery, but the instance has no strong reference to the elements inside it, as they are allowed to be removed even if the instance is still alive). Note that "Py_VISIT()" requires the *visit* and *arg* parameters to "local_traverse()" to have these specific names; don’t name them just anything. Instances of heap-allocated types hold a reference to their type. Their traversal function must therefore either visit "Py_TYPE(self)", or delegate this responsibility by calling "tp_traverse" of another heap-allocated type (such as a heap- allocated superclass). If they do not, the type object may not be garbage-collected. Changed in version 3.9: Heap-allocated types are expected to visit "Py_TYPE(self)" in "tp_traverse". In earlier versions of Python, due to bug 40217, doing this may lead to crashes in subclasses. **Inheritance:** Group: "Py_TPFLAGS_HAVE_GC", "tp_traverse", "tp_clear" This field is inherited by subtypes together with "tp_clear" and the "Py_TPFLAGS_HAVE_GC" flag bit: the flag bit, "tp_traverse", and "tp_clear" are all inherited from the base type if they are all zero in the subtype. inquiry PyTypeObject.tp_clear An optional pointer to a clear function for the garbage collector. This is only used if the "Py_TPFLAGS_HAVE_GC" flag bit is set. The signature is: int tp_clear(PyObject *); The "tp_clear" member function is used to break reference cycles in cyclic garbage detected by the garbage collector. Taken together, all "tp_clear" functions in the system must combine to break all reference cycles. This is subtle, and if in any doubt supply a "tp_clear" function. For example, the tuple type does not implement a "tp_clear" function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. Therefore the "tp_clear" functions of other types must be sufficient to break any cycle containing a tuple. This isn’t immediately obvious, and there’s rarely a good reason to avoid implementing "tp_clear". Implementations of "tp_clear" should drop the instance’s references to those of its members that may be Python objects, and set its pointers to those members to "NULL", as in the following example: static int local_clear(localobject *self) { Py_CLEAR(self->key); Py_CLEAR(self->args); Py_CLEAR(self->kw); Py_CLEAR(self->dict); return 0; } The "Py_CLEAR()" macro should be used, because clearing references is delicate: the reference to the contained object must not be released (via "Py_DECREF()") until after the pointer to the contained object is set to "NULL". This is because releasing the reference may cause the contained object to become trash, triggering a chain of reclamation activity that may include invoking arbitrary Python code (due to finalizers, or weakref callbacks, associated with the contained object). If it’s possible for such code to reference *self* again, it’s important that the pointer to the contained object be "NULL" at that time, so that *self* knows the contained object can no longer be used. The "Py_CLEAR()" macro performs the operations in a safe order. If the "Py_TPFLAGS_MANAGED_DICT" bit is set in the "tp_flags" field, the traverse function must call "PyObject_ClearManagedDict()" like this: PyObject_ClearManagedDict((PyObject*)self); Note that "tp_clear" is not *always* called before an instance is deallocated. For example, when reference counting is enough to determine that an object is no longer used, the cyclic garbage collector is not involved and "tp_dealloc" is called directly. Because the goal of "tp_clear" functions is to break reference cycles, it’s not necessary to clear contained objects like Python strings or Python integers, which can’t participate in reference cycles. On the other hand, it may be convenient to clear all contained Python objects, and write the type’s "tp_dealloc" function to invoke "tp_clear". More information about Python’s garbage collection scheme can be found in section Supporting Cyclic Garbage Collection. **Inheritance:** Group: "Py_TPFLAGS_HAVE_GC", "tp_traverse", "tp_clear" This field is inherited by subtypes together with "tp_traverse" and the "Py_TPFLAGS_HAVE_GC" flag bit: the flag bit, "tp_traverse", and "tp_clear" are all inherited from the base type if they are all zero in the subtype. richcmpfunc PyTypeObject.tp_richcompare An optional pointer to the rich comparison function, whose signature is: PyObject *tp_richcompare(PyObject *self, PyObject *other, int op); The first parameter is guaranteed to be an instance of the type that is defined by "PyTypeObject". The function should return the result of the comparison (usually "Py_True" or "Py_False"). If the comparison is undefined, it must return "Py_NotImplemented", if another error occurred it must return "NULL" and set an exception condition. The following constants are defined to be used as the third argument for "tp_richcompare" and for "PyObject_RichCompare()": +----------------------+--------------+ | Constant | Comparison | |======================|==============| | Py_LT | "<" | +----------------------+--------------+ | Py_LE | "<=" | +----------------------+--------------+ | Py_EQ | "==" | +----------------------+--------------+ | Py_NE | "!=" | +----------------------+--------------+ | Py_GT | ">" | +----------------------+--------------+ | Py_GE | ">=" | +----------------------+--------------+ The following macro is defined to ease writing rich comparison functions: Py_RETURN_RICHCOMPARE(VAL_A, VAL_B, op) Return "Py_True" or "Py_False" from the function, depending on the result of a comparison. VAL_A and VAL_B must be orderable by C comparison operators (for example, they may be C ints or floats). The third argument specifies the requested operation, as for "PyObject_RichCompare()". The returned value is a new *strong reference*. On error, sets an exception and returns "NULL" from the function. Added in version 3.7. **Inheritance:** Group: "tp_hash", "tp_richcompare" This field is inherited by subtypes together with "tp_hash": a subtype inherits "tp_richcompare" and "tp_hash" when the subtype’s "tp_richcompare" and "tp_hash" are both "NULL". **Default:** "PyBaseObject_Type" provides a "tp_richcompare" implementation, which may be inherited. However, if only "tp_hash" is defined, not even the inherited function is used and instances of the type will not be able to participate in any comparisons. Py_ssize_t PyTypeObject.tp_weaklistoffset While this field is still supported, "Py_TPFLAGS_MANAGED_WEAKREF" should be used instead, if at all possible. If the instances of this type are weakly referenceable, this field is greater than zero and contains the offset in the instance structure of the weak reference list head (ignoring the GC header, if present); this offset is used by "PyObject_ClearWeakRefs()" and the "PyWeakref_*" functions. The instance structure needs to include a field of type PyObject* which is initialized to "NULL". Do not confuse this field with "tp_weaklist"; that is the list head for weak references to the type object itself. It is an error to set both the "Py_TPFLAGS_MANAGED_WEAKREF" bit and "tp_weaklistoffset". **Inheritance:** This field is inherited by subtypes, but see the rules listed below. A subtype may override this offset; this means that the subtype uses a different weak reference list head than the base type. Since the list head is always found via "tp_weaklistoffset", this should not be a problem. **Default:** If the "Py_TPFLAGS_MANAGED_WEAKREF" bit is set in the "tp_flags" field, then "tp_weaklistoffset" will be set to a negative value, to indicate that it is unsafe to use this field. getiterfunc PyTypeObject.tp_iter An optional pointer to a function that returns an *iterator* for the object. Its presence normally signals that the instances of this type are *iterable* (although sequences may be iterable without this function). This function has the same signature as "PyObject_GetIter()": PyObject *tp_iter(PyObject *self); **Inheritance:** This field is inherited by subtypes. iternextfunc PyTypeObject.tp_iternext An optional pointer to a function that returns the next item in an *iterator*. The signature is: PyObject *tp_iternext(PyObject *self); When the iterator is exhausted, it must return "NULL"; a "StopIteration" exception may or may not be set. When another error occurs, it must return "NULL" too. Its presence signals that the instances of this type are iterators. Iterator types should also define the "tp_iter" function, and that function should return the iterator instance itself (not a new iterator instance). This function has the same signature as "PyIter_Next()". **Inheritance:** This field is inherited by subtypes. struct PyMethodDef *PyTypeObject.tp_methods An optional pointer to a static "NULL"-terminated array of "PyMethodDef" structures, declaring regular methods of this type. For each entry in the array, an entry is added to the type’s dictionary (see "tp_dict" below) containing a method descriptor. **Inheritance:** This field is not inherited by subtypes (methods are inherited through a different mechanism). struct PyMemberDef *PyTypeObject.tp_members An optional pointer to a static "NULL"-terminated array of "PyMemberDef" structures, declaring regular data members (fields or slots) of instances of this type. For each entry in the array, an entry is added to the type’s dictionary (see "tp_dict" below) containing a member descriptor. **Inheritance:** This field is not inherited by subtypes (members are inherited through a different mechanism). struct PyGetSetDef *PyTypeObject.tp_getset An optional pointer to a static "NULL"-terminated array of "PyGetSetDef" structures, declaring computed attributes of instances of this type. For each entry in the array, an entry is added to the type’s dictionary (see "tp_dict" below) containing a getset descriptor. **Inheritance:** This field is not inherited by subtypes (computed attributes are inherited through a different mechanism). PyTypeObject *PyTypeObject.tp_base An optional pointer to a base type from which type properties are inherited. At this level, only single inheritance is supported; multiple inheritance require dynamically creating a type object by calling the metatype. Note: Slot initialization is subject to the rules of initializing globals. C99 requires the initializers to be “address constants”. Function designators like "PyType_GenericNew()", with implicit conversion to a pointer, are valid C99 address constants.However, the unary ‘&’ operator applied to a non-static variable like "PyBaseObject_Type" is not required to produce an address constant. Compilers may support this (gcc does), MSVC does not. Both compilers are strictly standard conforming in this particular behavior.Consequently, "tp_base" should be set in the extension module’s init function. **Inheritance:** This field is not inherited by subtypes (obviously). **Default:** This field defaults to "&PyBaseObject_Type" (which to Python programmers is known as the type "object"). PyObject *PyTypeObject.tp_dict The type’s dictionary is stored here by "PyType_Ready()". This field should normally be initialized to "NULL" before PyType_Ready is called; it may also be initialized to a dictionary containing initial attributes for the type. Once "PyType_Ready()" has initialized the type, extra attributes for the type may be added to this dictionary only if they don’t correspond to overloaded operations (like "__add__()"). Once initialization for the type has finished, this field should be treated as read-only. Some types may not store their dictionary in this slot. Use "PyType_GetDict()" to retrieve the dictionary for an arbitrary type. Changed in version 3.12: Internals detail: For static builtin types, this is always "NULL". Instead, the dict for such types is stored on "PyInterpreterState". Use "PyType_GetDict()" to get the dict for an arbitrary type. **Inheritance:** This field is not inherited by subtypes (though the attributes defined in here are inherited through a different mechanism). **Default:** If this field is "NULL", "PyType_Ready()" will assign a new dictionary to it. Warning: It is not safe to use "PyDict_SetItem()" on or otherwise modify "tp_dict" with the dictionary C-API. descrgetfunc PyTypeObject.tp_descr_get An optional pointer to a “descriptor get” function. The function signature is: PyObject * tp_descr_get(PyObject *self, PyObject *obj, PyObject *type); **Inheritance:** This field is inherited by subtypes. descrsetfunc PyTypeObject.tp_descr_set An optional pointer to a function for setting and deleting a descriptor’s value. The function signature is: int tp_descr_set(PyObject *self, PyObject *obj, PyObject *value); The *value* argument is set to "NULL" to delete the value. **Inheritance:** This field is inherited by subtypes. Py_ssize_t PyTypeObject.tp_dictoffset While this field is still supported, "Py_TPFLAGS_MANAGED_DICT" should be used instead, if at all possible. If the instances of this type have a dictionary containing instance variables, this field is non-zero and contains the offset in the instances of the type of the instance variable dictionary; this offset is used by "PyObject_GenericGetAttr()". Do not confuse this field with "tp_dict"; that is the dictionary for attributes of the type object itself. The value specifies the offset of the dictionary from the start of the instance structure. The "tp_dictoffset" should be regarded as write-only. To get the pointer to the dictionary call "PyObject_GenericGetDict()". Calling "PyObject_GenericGetDict()" may need to allocate memory for the dictionary, so it is may be more efficient to call "PyObject_GetAttr()" when accessing an attribute on the object. It is an error to set both the "Py_TPFLAGS_MANAGED_DICT" bit and "tp_dictoffset". **Inheritance:** This field is inherited by subtypes. A subtype should not override this offset; doing so could be unsafe, if C code tries to access the dictionary at the previous offset. To properly support inheritance, use "Py_TPFLAGS_MANAGED_DICT". **Default:** This slot has no default. For static types, if the field is "NULL" then no "__dict__" gets created for instances. If the "Py_TPFLAGS_MANAGED_DICT" bit is set in the "tp_flags" field, then "tp_dictoffset" will be set to "-1", to indicate that it is unsafe to use this field. initproc PyTypeObject.tp_init An optional pointer to an instance initialization function. This function corresponds to the "__init__()" method of classes. Like "__init__()", it is possible to create an instance without calling "__init__()", and it is possible to reinitialize an instance by calling its "__init__()" method again. The function signature is: int tp_init(PyObject *self, PyObject *args, PyObject *kwds); The self argument is the instance to be initialized; the *args* and *kwds* arguments represent positional and keyword arguments of the call to "__init__()". The "tp_init" function, if not "NULL", is called when an instance is created normally by calling its type, after the type’s "tp_new" function has returned an instance of the type. If the "tp_new" function returns an instance of some other type that is not a subtype of the original type, no "tp_init" function is called; if "tp_new" returns an instance of a subtype of the original type, the subtype’s "tp_init" is called. Returns "0" on success, "-1" and sets an exception on error. **Inheritance:** This field is inherited by subtypes. **Default:** For static types this field does not have a default. allocfunc PyTypeObject.tp_alloc An optional pointer to an instance allocation function. The function signature is: PyObject *tp_alloc(PyTypeObject *self, Py_ssize_t nitems); **Inheritance:** This field is inherited by static subtypes, but not by dynamic subtypes (subtypes created by a class statement). **Default:** For dynamic subtypes, this field is always set to "PyType_GenericAlloc()", to force a standard heap allocation strategy. For static subtypes, "PyBaseObject_Type" uses "PyType_GenericAlloc()". That is the recommended value for all statically defined types. newfunc PyTypeObject.tp_new An optional pointer to an instance creation function. The function signature is: PyObject *tp_new(PyTypeObject *subtype, PyObject *args, PyObject *kwds); The *subtype* argument is the type of the object being created; the *args* and *kwds* arguments represent positional and keyword arguments of the call to the type. Note that *subtype* doesn’t have to equal the type whose "tp_new" function is called; it may be a subtype of that type (but not an unrelated type). The "tp_new" function should call "subtype->tp_alloc(subtype, nitems)" to allocate space for the object, and then do only as much further initialization as is absolutely necessary. Initialization that can safely be ignored or repeated should be placed in the "tp_init" handler. A good rule of thumb is that for immutable types, all initialization should take place in "tp_new", while for mutable types, most initialization should be deferred to "tp_init". Set the "Py_TPFLAGS_DISALLOW_INSTANTIATION" flag to disallow creating instances of the type in Python. **Inheritance:** This field is inherited by subtypes, except it is not inherited by static types whose "tp_base" is "NULL" or "&PyBaseObject_Type". **Default:** For static types this field has no default. This means if the slot is defined as "NULL", the type cannot be called to create new instances; presumably there is some other way to create instances, like a factory function. freefunc PyTypeObject.tp_free An optional pointer to an instance deallocation function. Its signature is: void tp_free(void *self); An initializer that is compatible with this signature is "PyObject_Free()". **Inheritance:** This field is inherited by static subtypes, but not by dynamic subtypes (subtypes created by a class statement) **Default:** In dynamic subtypes, this field is set to a deallocator suitable to match "PyType_GenericAlloc()" and the value of the "Py_TPFLAGS_HAVE_GC" flag bit. For static subtypes, "PyBaseObject_Type" uses "PyObject_Del()". inquiry PyTypeObject.tp_is_gc An optional pointer to a function called by the garbage collector. The garbage collector needs to know whether a particular object is collectible or not. Normally, it is sufficient to look at the object’s type’s "tp_flags" field, and check the "Py_TPFLAGS_HAVE_GC" flag bit. But some types have a mixture of statically and dynamically allocated instances, and the statically allocated instances are not collectible. Such types should define this function; it should return "1" for a collectible instance, and "0" for a non-collectible instance. The signature is: int tp_is_gc(PyObject *self); (The only example of this are types themselves. The metatype, "PyType_Type", defines this function to distinguish between statically and dynamically allocated types.) **Inheritance:** This field is inherited by subtypes. **Default:** This slot has no default. If this field is "NULL", "Py_TPFLAGS_HAVE_GC" is used as the functional equivalent. PyObject *PyTypeObject.tp_bases Tuple of base types. This field should be set to "NULL" and treated as read-only. Python will fill it in when the type is "initialized". For dynamically created classes, the "Py_tp_bases" "slot" can be used instead of the *bases* argument of "PyType_FromSpecWithBases()". The argument form is preferred. Warning: Multiple inheritance does not work well for statically defined types. If you set "tp_bases" to a tuple, Python will not raise an error, but some slots will only be inherited from the first base. **Inheritance:** This field is not inherited. PyObject *PyTypeObject.tp_mro Tuple containing the expanded set of base types, starting with the type itself and ending with "object", in Method Resolution Order. This field should be set to "NULL" and treated as read-only. Python will fill it in when the type is "initialized". **Inheritance:** This field is not inherited; it is calculated fresh by "PyType_Ready()". PyObject *PyTypeObject.tp_cache Unused. Internal use only. **Inheritance:** This field is not inherited. void *PyTypeObject.tp_subclasses A collection of subclasses. Internal use only. May be an invalid pointer. To get a list of subclasses, call the Python method "__subclasses__()". Changed in version 3.12: For some types, this field does not hold a valid PyObject*. The type was changed to void* to indicate this. **Inheritance:** This field is not inherited. PyObject *PyTypeObject.tp_weaklist Weak reference list head, for weak references to this type object. Not inherited. Internal use only. Changed in version 3.12: Internals detail: For the static builtin types this is always "NULL", even if weakrefs are added. Instead, the weakrefs for each are stored on "PyInterpreterState". Use the public C-API or the internal "_PyObject_GET_WEAKREFS_LISTPTR()" macro to avoid the distinction. **Inheritance:** This field is not inherited. destructor PyTypeObject.tp_del This field is deprecated. Use "tp_finalize" instead. unsigned int PyTypeObject.tp_version_tag Used to index into the method cache. Internal use only. **Inheritance:** This field is not inherited. destructor PyTypeObject.tp_finalize An optional pointer to an instance finalization function. Its signature is: void tp_finalize(PyObject *self); If "tp_finalize" is set, the interpreter calls it once when finalizing an instance. It is called either from the garbage collector (if the instance is part of an isolated reference cycle) or just before the object is deallocated. Either way, it is guaranteed to be called before attempting to break reference cycles, ensuring that it finds the object in a sane state. "tp_finalize" should not mutate the current exception status; therefore, a recommended way to write a non-trivial finalizer is: static void local_finalize(PyObject *self) { /* Save the current exception, if any. */ PyObject *exc = PyErr_GetRaisedException(); /* ... */ /* Restore the saved exception. */ PyErr_SetRaisedException(exc); } **Inheritance:** This field is inherited by subtypes. Added in version 3.4. Changed in version 3.8: Before version 3.8 it was necessary to set the "Py_TPFLAGS_HAVE_FINALIZE" flags bit in order for this field to be used. This is no longer required. See also: “Safe object finalization” (**PEP 442**) vectorcallfunc PyTypeObject.tp_vectorcall Vectorcall function to use for calls of this type object. In other words, it is used to implement vectorcall for "type.__call__". If "tp_vectorcall" is "NULL", the default call implementation using "__new__()" and "__init__()" is used. **Inheritance:** This field is never inherited. Added in version 3.9: (the field exists since 3.8 but it’s only used since 3.9) unsigned char PyTypeObject.tp_watched Internal. Do not use. Added in version 3.12. Static Types ============ Traditionally, types defined in C code are *static*, that is, a static "PyTypeObject" structure is defined directly in code and initialized using "PyType_Ready()". This results in types that are limited relative to types defined in Python: * Static types are limited to one base, i.e. they cannot use multiple inheritance. * Static type objects (but not necessarily their instances) are immutable. It is not possible to add or modify the type object’s attributes from Python. * Static type objects are shared across sub-interpreters, so they should not include any subinterpreter-specific state. Also, since "PyTypeObject" is only part of the Limited API as an opaque struct, any extension modules using static types must be compiled for a specific Python minor version. Heap Types ========== An alternative to static types is *heap-allocated types*, or *heap types* for short, which correspond closely to classes created by Python’s "class" statement. Heap types have the "Py_TPFLAGS_HEAPTYPE" flag set. This is done by filling a "PyType_Spec" structure and calling "PyType_FromSpec()", "PyType_FromSpecWithBases()", "PyType_FromModuleAndSpec()", or "PyType_FromMetaclass()". Number Object Structures ======================== type PyNumberMethods This structure holds pointers to the functions which an object uses to implement the number protocol. Each function is used by the function of similar name documented in the Number Protocol section. Here is the structure definition: typedef struct { binaryfunc nb_add; binaryfunc nb_subtract; binaryfunc nb_multiply; binaryfunc nb_remainder; binaryfunc nb_divmod; ternaryfunc nb_power; unaryfunc nb_negative; unaryfunc nb_positive; unaryfunc nb_absolute; inquiry nb_bool; unaryfunc nb_invert; binaryfunc nb_lshift; binaryfunc nb_rshift; binaryfunc nb_and; binaryfunc nb_xor; binaryfunc nb_or; unaryfunc nb_int; void *nb_reserved; unaryfunc nb_float; binaryfunc nb_inplace_add; binaryfunc nb_inplace_subtract; binaryfunc nb_inplace_multiply; binaryfunc nb_inplace_remainder; ternaryfunc nb_inplace_power; binaryfunc nb_inplace_lshift; binaryfunc nb_inplace_rshift; binaryfunc nb_inplace_and; binaryfunc nb_inplace_xor; binaryfunc nb_inplace_or; binaryfunc nb_floor_divide; binaryfunc nb_true_divide; binaryfunc nb_inplace_floor_divide; binaryfunc nb_inplace_true_divide; unaryfunc nb_index; binaryfunc nb_matrix_multiply; binaryfunc nb_inplace_matrix_multiply; } PyNumberMethods; Note: Binary and ternary functions must check the type of all their operands, and implement the necessary conversions (at least one of the operands is an instance of the defined type). If the operation is not defined for the given operands, binary and ternary functions must return "Py_NotImplemented", if another error occurred they must return "NULL" and set an exception. Note: The "nb_reserved" field should always be "NULL". It was previously called "nb_long", and was renamed in Python 3.0.1. binaryfunc PyNumberMethods.nb_add binaryfunc PyNumberMethods.nb_subtract binaryfunc PyNumberMethods.nb_multiply binaryfunc PyNumberMethods.nb_remainder binaryfunc PyNumberMethods.nb_divmod ternaryfunc PyNumberMethods.nb_power unaryfunc PyNumberMethods.nb_negative unaryfunc PyNumberMethods.nb_positive unaryfunc PyNumberMethods.nb_absolute inquiry PyNumberMethods.nb_bool unaryfunc PyNumberMethods.nb_invert binaryfunc PyNumberMethods.nb_lshift binaryfunc PyNumberMethods.nb_rshift binaryfunc PyNumberMethods.nb_and binaryfunc PyNumberMethods.nb_xor binaryfunc PyNumberMethods.nb_or unaryfunc PyNumberMethods.nb_int void *PyNumberMethods.nb_reserved unaryfunc PyNumberMethods.nb_float binaryfunc PyNumberMethods.nb_inplace_add binaryfunc PyNumberMethods.nb_inplace_subtract binaryfunc PyNumberMethods.nb_inplace_multiply binaryfunc PyNumberMethods.nb_inplace_remainder ternaryfunc PyNumberMethods.nb_inplace_power binaryfunc PyNumberMethods.nb_inplace_lshift binaryfunc PyNumberMethods.nb_inplace_rshift binaryfunc PyNumberMethods.nb_inplace_and binaryfunc PyNumberMethods.nb_inplace_xor binaryfunc PyNumberMethods.nb_inplace_or binaryfunc PyNumberMethods.nb_floor_divide binaryfunc PyNumberMethods.nb_true_divide binaryfunc PyNumberMethods.nb_inplace_floor_divide binaryfunc PyNumberMethods.nb_inplace_true_divide unaryfunc PyNumberMethods.nb_index binaryfunc PyNumberMethods.nb_matrix_multiply binaryfunc PyNumberMethods.nb_inplace_matrix_multiply Mapping Object Structures ========================= type PyMappingMethods This structure holds pointers to the functions which an object uses to implement the mapping protocol. It has three members: lenfunc PyMappingMethods.mp_length This function is used by "PyMapping_Size()" and "PyObject_Size()", and has the same signature. This slot may be set to "NULL" if the object has no defined length. binaryfunc PyMappingMethods.mp_subscript This function is used by "PyObject_GetItem()" and "PySequence_GetSlice()", and has the same signature as "PyObject_GetItem()". This slot must be filled for the "PyMapping_Check()" function to return "1", it can be "NULL" otherwise. objobjargproc PyMappingMethods.mp_ass_subscript This function is used by "PyObject_SetItem()", "PyObject_DelItem()", "PySequence_SetSlice()" and "PySequence_DelSlice()". It has the same signature as "PyObject_SetItem()", but *v* can also be set to "NULL" to delete an item. If this slot is "NULL", the object does not support item assignment and deletion. Sequence Object Structures ========================== type PySequenceMethods This structure holds pointers to the functions which an object uses to implement the sequence protocol. lenfunc PySequenceMethods.sq_length This function is used by "PySequence_Size()" and "PyObject_Size()", and has the same signature. It is also used for handling negative indices via the "sq_item" and the "sq_ass_item" slots. binaryfunc PySequenceMethods.sq_concat This function is used by "PySequence_Concat()" and has the same signature. It is also used by the "+" operator, after trying the numeric addition via the "nb_add" slot. ssizeargfunc PySequenceMethods.sq_repeat This function is used by "PySequence_Repeat()" and has the same signature. It is also used by the "*" operator, after trying numeric multiplication via the "nb_multiply" slot. ssizeargfunc PySequenceMethods.sq_item This function is used by "PySequence_GetItem()" and has the same signature. It is also used by "PyObject_GetItem()", after trying the subscription via the "mp_subscript" slot. This slot must be filled for the "PySequence_Check()" function to return "1", it can be "NULL" otherwise. Negative indexes are handled as follows: if the "sq_length" slot is filled, it is called and the sequence length is used to compute a positive index which is passed to "sq_item". If "sq_length" is "NULL", the index is passed as is to the function. ssizeobjargproc PySequenceMethods.sq_ass_item This function is used by "PySequence_SetItem()" and has the same signature. It is also used by "PyObject_SetItem()" and "PyObject_DelItem()", after trying the item assignment and deletion via the "mp_ass_subscript" slot. This slot may be left to "NULL" if the object does not support item assignment and deletion. objobjproc PySequenceMethods.sq_contains This function may be used by "PySequence_Contains()" and has the same signature. This slot may be left to "NULL", in this case "PySequence_Contains()" simply traverses the sequence until it finds a match. binaryfunc PySequenceMethods.sq_inplace_concat This function is used by "PySequence_InPlaceConcat()" and has the same signature. It should modify its first operand, and return it. This slot may be left to "NULL", in this case "PySequence_InPlaceConcat()" will fall back to "PySequence_Concat()". It is also used by the augmented assignment "+=", after trying numeric in-place addition via the "nb_inplace_add" slot. ssizeargfunc PySequenceMethods.sq_inplace_repeat This function is used by "PySequence_InPlaceRepeat()" and has the same signature. It should modify its first operand, and return it. This slot may be left to "NULL", in this case "PySequence_InPlaceRepeat()" will fall back to "PySequence_Repeat()". It is also used by the augmented assignment "*=", after trying numeric in-place multiplication via the "nb_inplace_multiply" slot. Buffer Object Structures ======================== type PyBufferProcs This structure holds pointers to the functions required by the Buffer protocol. The protocol defines how an exporter object can expose its internal data to consumer objects. getbufferproc PyBufferProcs.bf_getbuffer The signature of this function is: int (PyObject *exporter, Py_buffer *view, int flags); Handle a request to *exporter* to fill in *view* as specified by *flags*. Except for point (3), an implementation of this function MUST take these steps: 1. Check if the request can be met. If not, raise "BufferError", set view->obj to "NULL" and return "-1". 2. Fill in the requested fields. 3. Increment an internal counter for the number of exports. 4. Set view->obj to *exporter* and increment view->obj. 5. Return "0". If *exporter* is part of a chain or tree of buffer providers, two main schemes can be used: * Re-export: Each member of the tree acts as the exporting object and sets view->obj to a new reference to itself. * Redirect: The buffer request is redirected to the root object of the tree. Here, view->obj will be a new reference to the root object. The individual fields of *view* are described in section Buffer structure, the rules how an exporter must react to specific requests are in section Buffer request types. All memory pointed to in the "Py_buffer" structure belongs to the exporter and must remain valid until there are no consumers left. "format", "shape", "strides", "suboffsets" and "internal" are read- only for the consumer. "PyBuffer_FillInfo()" provides an easy way of exposing a simple bytes buffer while dealing correctly with all request types. "PyObject_GetBuffer()" is the interface for the consumer that wraps this function. releasebufferproc PyBufferProcs.bf_releasebuffer The signature of this function is: void (PyObject *exporter, Py_buffer *view); Handle a request to release the resources of the buffer. If no resources need to be released, "PyBufferProcs.bf_releasebuffer" may be "NULL". Otherwise, a standard implementation of this function will take these optional steps: 1. Decrement an internal counter for the number of exports. 2. If the counter is "0", free all memory associated with *view*. The exporter MUST use the "internal" field to keep track of buffer- specific resources. This field is guaranteed to remain constant, while a consumer MAY pass a copy of the original buffer as the *view* argument. This function MUST NOT decrement view->obj, since that is done automatically in "PyBuffer_Release()" (this scheme is useful for breaking reference cycles). "PyBuffer_Release()" is the interface for the consumer that wraps this function. Async Object Structures ======================= Added in version 3.5. type PyAsyncMethods This structure holds pointers to the functions required to implement *awaitable* and *asynchronous iterator* objects. Here is the structure definition: typedef struct { unaryfunc am_await; unaryfunc am_aiter; unaryfunc am_anext; sendfunc am_send; } PyAsyncMethods; unaryfunc PyAsyncMethods.am_await The signature of this function is: PyObject *am_await(PyObject *self); The returned object must be an *iterator*, i.e. "PyIter_Check()" must return "1" for it. This slot may be set to "NULL" if an object is not an *awaitable*. unaryfunc PyAsyncMethods.am_aiter The signature of this function is: PyObject *am_aiter(PyObject *self); Must return an *asynchronous iterator* object. See "__anext__()" for details. This slot may be set to "NULL" if an object does not implement asynchronous iteration protocol. unaryfunc PyAsyncMethods.am_anext The signature of this function is: PyObject *am_anext(PyObject *self); Must return an *awaitable* object. See "__anext__()" for details. This slot may be set to "NULL". sendfunc PyAsyncMethods.am_send The signature of this function is: PySendResult am_send(PyObject *self, PyObject *arg, PyObject **result); See "PyIter_Send()" for details. This slot may be set to "NULL". Added in version 3.10. Slot Type typedefs ================== typedef PyObject *(*allocfunc)(PyTypeObject *cls, Py_ssize_t nitems) * Part of the Stable ABI.* The purpose of this function is to separate memory allocation from memory initialization. It should return a pointer to a block of memory of adequate length for the instance, suitably aligned, and initialized to zeros, but with "ob_refcnt" set to "1" and "ob_type" set to the type argument. If the type’s "tp_itemsize" is non-zero, the object’s "ob_size" field should be initialized to *nitems* and the length of the allocated memory block should be "tp_basicsize + nitems*tp_itemsize", rounded up to a multiple of "sizeof(void*)"; otherwise, *nitems* is not used and the length of the block should be "tp_basicsize". This function should not do any other instance initialization, not even to allocate additional memory; that should be done by "tp_new". typedef void (*destructor)(PyObject*) * Part of the Stable ABI.* typedef void (*freefunc)(void*) See "tp_free". typedef PyObject *(*newfunc)(PyTypeObject*, PyObject*, PyObject*) * Part of the Stable ABI.* See "tp_new". typedef int (*initproc)(PyObject*, PyObject*, PyObject*) * Part of the Stable ABI.* See "tp_init". typedef PyObject *(*reprfunc)(PyObject*) * Part of the Stable ABI.* See "tp_repr". typedef PyObject *(*getattrfunc)(PyObject *self, char *attr) * Part of the Stable ABI.* Return the value of the named attribute for the object. typedef int (*setattrfunc)(PyObject *self, char *attr, PyObject *value) * Part of the Stable ABI.* Set the value of the named attribute for the object. The value argument is set to "NULL" to delete the attribute. typedef PyObject *(*getattrofunc)(PyObject *self, PyObject *attr) * Part of the Stable ABI.* Return the value of the named attribute for the object. See "tp_getattro". typedef int (*setattrofunc)(PyObject *self, PyObject *attr, PyObject *value) * Part of the Stable ABI.* Set the value of the named attribute for the object. The value argument is set to "NULL" to delete the attribute. See "tp_setattro". typedef PyObject *(*descrgetfunc)(PyObject*, PyObject*, PyObject*) * Part of the Stable ABI.* See "tp_descr_get". typedef int (*descrsetfunc)(PyObject*, PyObject*, PyObject*) * Part of the Stable ABI.* See "tp_descr_set". typedef Py_hash_t (*hashfunc)(PyObject*) * Part of the Stable ABI.* See "tp_hash". typedef PyObject *(*richcmpfunc)(PyObject*, PyObject*, int) * Part of the Stable ABI.* See "tp_richcompare". typedef PyObject *(*getiterfunc)(PyObject*) * Part of the Stable ABI.* See "tp_iter". typedef PyObject *(*iternextfunc)(PyObject*) * Part of the Stable ABI.* See "tp_iternext". typedef Py_ssize_t (*lenfunc)(PyObject*) * Part of the Stable ABI.* typedef int (*getbufferproc)(PyObject*, Py_buffer*, int) * Part of the Stable ABI since version 3.12.* typedef void (*releasebufferproc)(PyObject*, Py_buffer*) * Part of the Stable ABI since version 3.12.* typedef PyObject *(*unaryfunc)(PyObject*) * Part of the Stable ABI.* typedef PyObject *(*binaryfunc)(PyObject*, PyObject*) * Part of the Stable ABI.* typedef PySendResult (*sendfunc)(PyObject*, PyObject*, PyObject**) See "am_send". typedef PyObject *(*ternaryfunc)(PyObject*, PyObject*, PyObject*) * Part of the Stable ABI.* typedef PyObject *(*ssizeargfunc)(PyObject*, Py_ssize_t) * Part of the Stable ABI.* typedef int (*ssizeobjargproc)(PyObject*, Py_ssize_t, PyObject*) * Part of the Stable ABI.* typedef int (*objobjproc)(PyObject*, PyObject*) * Part of the Stable ABI.* typedef int (*objobjargproc)(PyObject*, PyObject*, PyObject*) * Part of the Stable ABI.* Examples ======== The following are simple examples of Python type definitions. They include common usage you may encounter. Some demonstrate tricky corner cases. For more examples, practical info, and a tutorial, see Defining Extension Types: Tutorial and Defining Extension Types: Assorted Topics. A basic static type: typedef struct { PyObject_HEAD const char *data; } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", .tp_basicsize = sizeof(MyObject), .tp_doc = PyDoc_STR("My objects"), .tp_new = myobj_new, .tp_dealloc = (destructor)myobj_dealloc, .tp_repr = (reprfunc)myobj_repr, }; You may also find older code (especially in the CPython code base) with a more verbose initializer: static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) "mymod.MyObject", /* tp_name */ sizeof(MyObject), /* tp_basicsize */ 0, /* tp_itemsize */ (destructor)myobj_dealloc, /* tp_dealloc */ 0, /* tp_vectorcall_offset */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_as_async */ (reprfunc)myobj_repr, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ 0, /* tp_flags */ PyDoc_STR("My objects"), /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ 0, /* tp_methods */ 0, /* tp_members */ 0, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ 0, /* tp_init */ 0, /* tp_alloc */ myobj_new, /* tp_new */ }; A type that supports weakrefs, instance dicts, and hashing: typedef struct { PyObject_HEAD const char *data; } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", .tp_basicsize = sizeof(MyObject), .tp_doc = PyDoc_STR("My objects"), .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC | Py_TPFLAGS_MANAGED_DICT | Py_TPFLAGS_MANAGED_WEAKREF, .tp_new = myobj_new, .tp_traverse = (traverseproc)myobj_traverse, .tp_clear = (inquiry)myobj_clear, .tp_alloc = PyType_GenericNew, .tp_dealloc = (destructor)myobj_dealloc, .tp_repr = (reprfunc)myobj_repr, .tp_hash = (hashfunc)myobj_hash, .tp_richcompare = PyBaseObject_Type.tp_richcompare, }; A str subclass that cannot be subclassed and cannot be called to create instances (e.g. uses a separate factory func) using "Py_TPFLAGS_DISALLOW_INSTANTIATION" flag: typedef struct { PyUnicodeObject raw; char *extra; } MyStr; static PyTypeObject MyStr_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyStr", .tp_basicsize = sizeof(MyStr), .tp_base = NULL, // set to &PyUnicode_Type in module init .tp_doc = PyDoc_STR("my custom str"), .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_DISALLOW_INSTANTIATION, .tp_repr = (reprfunc)myobj_repr, }; The simplest static type with fixed-length instances: typedef struct { PyObject_HEAD } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", }; The simplest static type with variable-length instances: typedef struct { PyObject_VAR_HEAD const char *data[1]; } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", .tp_basicsize = sizeof(MyObject) - sizeof(char *), .tp_itemsize = sizeof(char *), }; Unicode Objects and Codecs ************************** Unicode Objects =============== Since the implementation of **PEP 393** in Python 3.3, Unicode objects internally use a variety of representations, in order to allow handling the complete range of Unicode characters while staying memory efficient. There are special cases for strings where all code points are below 128, 256, or 65536; otherwise, code points must be below 1114112 (which is the full Unicode range). UTF-8 representation is created on demand and cached in the Unicode object. Note: The "Py_UNICODE" representation has been removed since Python 3.12 with deprecated APIs. See **PEP 623** for more information. Unicode Type ------------ These are the basic Unicode object types used for the Unicode implementation in Python: type Py_UCS4 type Py_UCS2 type Py_UCS1 * Part of the Stable ABI.* These types are typedefs for unsigned integer types wide enough to contain characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with single Unicode characters, use "Py_UCS4". Added in version 3.3. type Py_UNICODE This is a typedef of "wchar_t", which is a 16-bit type or 32-bit type depending on the platform. Changed in version 3.3: In previous versions, this was a 16-bit type or a 32-bit type depending on whether you selected a “narrow” or “wide” Unicode version of Python at build time. Deprecated since version 3.13, will be removed in version 3.15. type PyASCIIObject type PyCompactUnicodeObject type PyUnicodeObject These subtypes of "PyObject" represent a Python Unicode object. In almost all cases, they shouldn’t be used directly, since all API functions that deal with Unicode objects take and return "PyObject" pointers. Added in version 3.3. PyTypeObject PyUnicode_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python Unicode type. It is exposed to Python code as "str". PyTypeObject PyUnicodeIter_Type * Part of the Stable ABI.* This instance of "PyTypeObject" represents the Python Unicode iterator type. It is used to iterate over Unicode string objects. The following APIs are C macros and static inlined functions for fast checks and access to internal read-only data of Unicode objects: int PyUnicode_Check(PyObject *obj) Return true if the object *obj* is a Unicode object or an instance of a Unicode subtype. This function always succeeds. int PyUnicode_CheckExact(PyObject *obj) Return true if the object *obj* is a Unicode object, but not an instance of a subtype. This function always succeeds. int PyUnicode_READY(PyObject *unicode) Returns "0". This API is kept only for backward compatibility. Added in version 3.3. Deprecated since version 3.10: This API does nothing since Python 3.12. Py_ssize_t PyUnicode_GET_LENGTH(PyObject *unicode) Return the length of the Unicode string, in code points. *unicode* has to be a Unicode object in the “canonical” representation (not checked). Added in version 3.3. Py_UCS1 *PyUnicode_1BYTE_DATA(PyObject *unicode) Py_UCS2 *PyUnicode_2BYTE_DATA(PyObject *unicode) Py_UCS4 *PyUnicode_4BYTE_DATA(PyObject *unicode) Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4 integer types for direct character access. No checks are performed if the canonical representation has the correct character size; use "PyUnicode_KIND()" to select the right function. Added in version 3.3. PyUnicode_1BYTE_KIND PyUnicode_2BYTE_KIND PyUnicode_4BYTE_KIND Return values of the "PyUnicode_KIND()" macro. Added in version 3.3. Changed in version 3.12: "PyUnicode_WCHAR_KIND" has been removed. int PyUnicode_KIND(PyObject *unicode) Return one of the PyUnicode kind constants (see above) that indicate how many bytes per character this Unicode object uses to store its data. *unicode* has to be a Unicode object in the “canonical” representation (not checked). Added in version 3.3. void *PyUnicode_DATA(PyObject *unicode) Return a void pointer to the raw Unicode buffer. *unicode* has to be a Unicode object in the “canonical” representation (not checked). Added in version 3.3. void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, Py_UCS4 value) Write into a canonical representation *data* (as obtained with "PyUnicode_DATA()"). This function performs no sanity checks, and is intended for usage in loops. The caller should cache the *kind* value and *data* pointer as obtained from other calls. *index* is the index in the string (starts at 0) and *value* is the new code point value which should be written to that location. Added in version 3.3. Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index) Read a code point from a canonical representation *data* (as obtained with "PyUnicode_DATA()"). No checks or ready calls are performed. Added in version 3.3. Py_UCS4 PyUnicode_READ_CHAR(PyObject *unicode, Py_ssize_t index) Read a character from a Unicode object *unicode*, which must be in the “canonical” representation. This is less efficient than "PyUnicode_READ()" if you do multiple consecutive reads. Added in version 3.3. Py_UCS4 PyUnicode_MAX_CHAR_VALUE(PyObject *unicode) Return the maximum code point that is suitable for creating another string based on *unicode*, which must be in the “canonical” representation. This is always an approximation but more efficient than iterating over the string. Added in version 3.3. int PyUnicode_IsIdentifier(PyObject *unicode) * Part of the Stable ABI.* Return "1" if the string is a valid identifier according to the language definition, section Identifiers and keywords. Return "0" otherwise. Changed in version 3.9: The function does not call "Py_FatalError()" anymore if the string is not ready. Unicode Character Properties ---------------------------- Unicode provides many different character properties. The most often needed ones are available through these macros which are mapped to C functions depending on the Python configuration. int Py_UNICODE_ISSPACE(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a whitespace character. int Py_UNICODE_ISLOWER(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a lowercase character. int Py_UNICODE_ISUPPER(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is an uppercase character. int Py_UNICODE_ISTITLE(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a titlecase character. int Py_UNICODE_ISLINEBREAK(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a linebreak character. int Py_UNICODE_ISDECIMAL(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a decimal character. int Py_UNICODE_ISDIGIT(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a digit character. int Py_UNICODE_ISNUMERIC(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a numeric character. int Py_UNICODE_ISALPHA(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is an alphabetic character. int Py_UNICODE_ISALNUM(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is an alphanumeric character. int Py_UNICODE_ISPRINTABLE(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a printable character, in the sense of "str.isprintable()". These APIs can be used for fast direct character conversions: Py_UCS4 Py_UNICODE_TOLOWER(Py_UCS4 ch) Return the character *ch* converted to lower case. Py_UCS4 Py_UNICODE_TOUPPER(Py_UCS4 ch) Return the character *ch* converted to upper case. Py_UCS4 Py_UNICODE_TOTITLE(Py_UCS4 ch) Return the character *ch* converted to title case. int Py_UNICODE_TODECIMAL(Py_UCS4 ch) Return the character *ch* converted to a decimal positive integer. Return "-1" if this is not possible. This function does not raise exceptions. int Py_UNICODE_TODIGIT(Py_UCS4 ch) Return the character *ch* converted to a single digit integer. Return "-1" if this is not possible. This function does not raise exceptions. double Py_UNICODE_TONUMERIC(Py_UCS4 ch) Return the character *ch* converted to a double. Return "-1.0" if this is not possible. This function does not raise exceptions. These APIs can be used to work with surrogates: int Py_UNICODE_IS_SURROGATE(Py_UCS4 ch) Check if *ch* is a surrogate ("0xD800 <= ch <= 0xDFFF"). int Py_UNICODE_IS_HIGH_SURROGATE(Py_UCS4 ch) Check if *ch* is a high surrogate ("0xD800 <= ch <= 0xDBFF"). int Py_UNICODE_IS_LOW_SURROGATE(Py_UCS4 ch) Check if *ch* is a low surrogate ("0xDC00 <= ch <= 0xDFFF"). Py_UCS4 Py_UNICODE_JOIN_SURROGATES(Py_UCS4 high, Py_UCS4 low) Join two surrogate code points and return a single "Py_UCS4" value. *high* and *low* are respectively the leading and trailing surrogates in a surrogate pair. *high* must be in the range [0xD800; 0xDBFF] and *low* must be in the range [0xDC00; 0xDFFF]. Creating and accessing Unicode strings -------------------------------------- To create Unicode objects and access their basic sequence properties, use these APIs: PyObject *PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar) *Return value: New reference.* Create a new Unicode object. *maxchar* should be the true maximum code point to be placed in the string. As an approximation, it can be rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. This is the recommended way to allocate a new Unicode object. Objects created using this function are not resizable. On error, set an exception and return "NULL". Added in version 3.3. PyObject *PyUnicode_FromKindAndData(int kind, const void *buffer, Py_ssize_t size) *Return value: New reference.* Create a new Unicode object with the given *kind* (possible values are "PyUnicode_1BYTE_KIND" etc., as returned by "PyUnicode_KIND()"). The *buffer* must point to an array of *size* units of 1, 2 or 4 bytes per character, as given by the kind. If necessary, the input *buffer* is copied and transformed into the canonical representation. For example, if the *buffer* is a UCS4 string ("PyUnicode_4BYTE_KIND") and it consists only of codepoints in the UCS1 range, it will be transformed into UCS1 ("PyUnicode_1BYTE_KIND"). Added in version 3.3. PyObject *PyUnicode_FromStringAndSize(const char *str, Py_ssize_t size) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object from the char buffer *str*. The bytes will be interpreted as being UTF-8 encoded. The buffer is copied into the new object. The return value might be a shared object, i.e. modification of the data is not allowed. This function raises "SystemError" when: * *size* < 0, * *str* is "NULL" and *size* > 0 Changed in version 3.12: *str* == "NULL" with *size* > 0 is not allowed anymore. PyObject *PyUnicode_FromString(const char *str) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object from a UTF-8 encoded null-terminated char buffer *str*. PyObject *PyUnicode_FromFormat(const char *format, ...) *Return value: New reference.** Part of the Stable ABI.* Take a C "printf()"-style *format* string and a variable number of arguments, calculate the size of the resulting Python Unicode string and return a string with the values formatted into it. The variable arguments must be C types and must correspond exactly to the format characters in the *format* ASCII-encoded string. A conversion specifier contains two or more characters and has the following components, which must occur in this order: 1. The "'%'" character, which marks the start of the specifier. 2. Conversion flags (optional), which affect the result of some conversion types. 3. Minimum field width (optional). If specified as an "'*'" (asterisk), the actual width is given in the next argument, which must be of type int, and the object to convert comes after the minimum field width and optional precision. 4. Precision (optional), given as a "'.'" (dot) followed by the precision. If specified as "'*'" (an asterisk), the actual precision is given in the next argument, which must be of type int, and the value to convert comes after the precision. 5. Length modifier (optional). 6. Conversion type. The conversion flag characters are: +---------+---------------------------------------------------------------+ | Flag | Meaning | |=========|===============================================================| | "0" | The conversion will be zero padded for numeric values. | +---------+---------------------------------------------------------------+ | "-" | The converted value is left adjusted (overrides the "0" flag | | | if both are given). | +---------+---------------------------------------------------------------+ The length modifiers for following integer conversions ("d", "i", "o", "u", "x", or "X") specify the type of the argument (int by default): +------------+-------------------------------------------------------+ | Modifier | Types | |============|=======================================================| | "l" | long or unsigned long | +------------+-------------------------------------------------------+ | "ll" | long long or unsigned long long | +------------+-------------------------------------------------------+ | "j" | "intmax_t" or "uintmax_t" | +------------+-------------------------------------------------------+ | "z" | "size_t" or "ssize_t" | +------------+-------------------------------------------------------+ | "t" | "ptrdiff_t" | +------------+-------------------------------------------------------+ The length modifier "l" for following conversions "s" or "V" specify that the type of the argument is const wchar_t*. The conversion specifiers are: +-----------------------------------+-----------------------------------+-----------------------------------+ | Conversion Specifier | Type | Comment | |===================================|===================================|===================================| | "%" | *n/a* | The literal "%" character. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "d", "i" | Specified by the length modifier | The decimal representation of a | | | | signed C integer. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "u" | Specified by the length modifier | The decimal representation of an | | | | unsigned C integer. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "o" | Specified by the length modifier | The octal representation of an | | | | unsigned C integer. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "x" | Specified by the length modifier | The hexadecimal representation of | | | | an unsigned C integer | | | | (lowercase). | +-----------------------------------+-----------------------------------+-----------------------------------+ | "X" | Specified by the length modifier | The hexadecimal representation of | | | | an unsigned C integer | | | | (uppercase). | +-----------------------------------+-----------------------------------+-----------------------------------+ | "c" | int | A single character. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "s" | const char* or const wchar_t* | A null-terminated C character | | | | array. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "p" | const void* | The hex representation of a C | | | | pointer. Mostly equivalent to | | | | "printf("%p")" except that it is | | | | guaranteed to start with the | | | | literal "0x" regardless of what | | | | the platform’s "printf" yields. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "A" | PyObject* | The result of calling "ascii()". | +-----------------------------------+-----------------------------------+-----------------------------------+ | "U" | PyObject* | A Unicode object. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "V" | PyObject*, const char* or const | A Unicode object (which may be | | | wchar_t* | "NULL") and a null-terminated C | | | | character array as a second | | | | parameter (which will be used, if | | | | the first parameter is "NULL"). | +-----------------------------------+-----------------------------------+-----------------------------------+ | "S" | PyObject* | The result of calling | | | | "PyObject_Str()". | +-----------------------------------+-----------------------------------+-----------------------------------+ | "R" | PyObject* | The result of calling | | | | "PyObject_Repr()". | +-----------------------------------+-----------------------------------+-----------------------------------+ | "T" | PyObject* | Get the fully qualified name of | | | | an object type; call | | | | "PyType_GetFullyQualifiedName()". | +-----------------------------------+-----------------------------------+-----------------------------------+ | "#T" | PyObject* | Similar to "T" format, but use a | | | | colon (":") as separator between | | | | the module name and the qualified | | | | name. | +-----------------------------------+-----------------------------------+-----------------------------------+ | "N" | PyTypeObject* | Get the fully qualified name of a | | | | type; call | | | | "PyType_GetFullyQualifiedName()". | +-----------------------------------+-----------------------------------+-----------------------------------+ | "#N" | PyTypeObject* | Similar to "N" format, but use a | | | | colon (":") as separator between | | | | the module name and the qualified | | | | name. | +-----------------------------------+-----------------------------------+-----------------------------------+ Note: The width formatter unit is number of characters rather than bytes. The precision formatter unit is number of bytes or "wchar_t" items (if the length modifier "l" is used) for ""%s"" and ""%V"" (if the "PyObject*" argument is "NULL"), and a number of characters for ""%A"", ""%U"", ""%S"", ""%R"" and ""%V"" (if the "PyObject*" argument is not "NULL"). Note: Unlike to C "printf()" the "0" flag has effect even when a precision is given for integer conversions ("d", "i", "u", "o", "x", or "X"). Changed in version 3.2: Support for ""%lld"" and ""%llu"" added. Changed in version 3.3: Support for ""%li"", ""%lli"" and ""%zi"" added. Changed in version 3.4: Support width and precision formatter for ""%s"", ""%A"", ""%U"", ""%V"", ""%S"", ""%R"" added. Changed in version 3.12: Support for conversion specifiers "o" and "X". Support for length modifiers "j" and "t". Length modifiers are now applied to all integer conversions. Length modifier "l" is now applied to conversion specifiers "s" and "V". Support for variable width and precision "*". Support for flag "-".An unrecognized format character now sets a "SystemError". In previous versions it caused all the rest of the format string to be copied as-is to the result string, and any extra arguments discarded. Changed in version 3.13: Support for "%T", "%#T", "%N" and "%#N" formats added. PyObject *PyUnicode_FromFormatV(const char *format, va_list vargs) *Return value: New reference.** Part of the Stable ABI.* Identical to "PyUnicode_FromFormat()" except that it takes exactly two arguments. PyObject *PyUnicode_FromObject(PyObject *obj) *Return value: New reference.** Part of the Stable ABI.* Copy an instance of a Unicode subtype to a new true Unicode object if necessary. If *obj* is already a true Unicode object (not a subtype), return a new *strong reference* to the object. Objects other than Unicode or its subtypes will cause a "TypeError". PyObject *PyUnicode_FromOrdinal(int ordinal) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode Object from the given Unicode code point *ordinal*. The ordinal must be in "range(0x110000)". A "ValueError" is raised in the case it is not. PyObject *PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Decode an encoded object *obj* to a Unicode object. "bytes", "bytearray" and other *bytes-like objects* are decoded according to the given *encoding* and using the error handling defined by *errors*. Both can be "NULL" to have the interface use the default values (see Built-in Codecs for details). All other objects, including Unicode objects, cause a "TypeError" to be set. The API returns "NULL" if there was an error. The caller is responsible for decref’ing the returned objects. PyObject *PyUnicode_BuildEncodingMap(PyObject *string) *Return value: New reference.** Part of the Stable ABI.* Return a mapping suitable for decoding a custom single-byte encoding. Given a Unicode string *string* of up to 256 characters representing an encoding table, returns either a compact internal mapping object or a dictionary mapping character ordinals to byte values. Raises a "TypeError" and return "NULL" on invalid input. .. versionadded:: 3.2 const char *PyUnicode_GetDefaultEncoding(void) * Part of the Stable ABI.* Return the name of the default string encoding, ""utf-8"". See "sys.getdefaultencoding()". The returned string does not need to be freed, and is valid until interpreter shutdown. Py_ssize_t PyUnicode_GetLength(PyObject *unicode) * Part of the Stable ABI since version 3.7.* Return the length of the Unicode object, in code points. On error, set an exception and return "-1". Added in version 3.3. Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many) Copy characters from one Unicode object into another. This function performs character conversion when necessary and falls back to "memcpy()" if possible. Returns "-1" and sets an exception on error, otherwise returns the number of copied characters. Added in version 3.3. Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, Py_ssize_t length, Py_UCS4 fill_char) Fill a string with a character: write *fill_char* into "unicode[start:start+length]". Fail if *fill_char* is bigger than the string maximum character, or if the string has more than 1 reference. Return the number of written character, or return "-1" and raise an exception on error. Added in version 3.3. int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, Py_UCS4 character) * Part of the Stable ABI since version 3.7.* Write a character to a string. The string must have been created through "PyUnicode_New()". Since Unicode strings are supposed to be immutable, the string must not be shared, or have been hashed yet. This function checks that *unicode* is a Unicode object, that the index is not out of bounds, and that the object can be modified safely (i.e. that it its reference count is one). Return "0" on success, "-1" on error with an exception set. Added in version 3.3. Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index) * Part of the Stable ABI since version 3.7.* Read a character from a string. This function checks that *unicode* is a Unicode object and the index is not out of bounds, in contrast to "PyUnicode_READ_CHAR()", which performs no error checking. Return character on success, "-1" on error with an exception set. Added in version 3.3. PyObject *PyUnicode_Substring(PyObject *unicode, Py_ssize_t start, Py_ssize_t end) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Return a substring of *unicode*, from character index *start* (included) to character index *end* (excluded). Negative indices are not supported. On error, set an exception and return "NULL". Added in version 3.3. Py_UCS4 *PyUnicode_AsUCS4(PyObject *unicode, Py_UCS4 *buffer, Py_ssize_t buflen, int copy_null) * Part of the Stable ABI since version 3.7.* Copy the string *unicode* into a UCS4 buffer, including a null character, if *copy_null* is set. Returns "NULL" and sets an exception on error (in particular, a "SystemError" if *buflen* is smaller than the length of *unicode*). *buffer* is returned on success. Added in version 3.3. Py_UCS4 *PyUnicode_AsUCS4Copy(PyObject *unicode) * Part of the Stable ABI since version 3.7.* Copy the string *unicode* into a new UCS4 buffer that is allocated using "PyMem_Malloc()". If this fails, "NULL" is returned with a "MemoryError" set. The returned buffer always has an extra null code point appended. Added in version 3.3. Locale Encoding --------------- The current locale encoding can be used to decode text from the operating system. PyObject *PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t length, const char *errors) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Decode a string from UTF-8 on Android and VxWorks, or from the current locale encoding on other platforms. The supported error handlers are ""strict"" and ""surrogateescape"" (**PEP 383**). The decoder uses ""strict"" error handler if *errors* is "NULL". *str* must end with a null character but cannot contain embedded null characters. Use "PyUnicode_DecodeFSDefaultAndSize()" to decode a string from the *filesystem encoding and error handler*. This function ignores the Python UTF-8 Mode. See also: The "Py_DecodeLocale()" function. Added in version 3.3. Changed in version 3.7: The function now also uses the current locale encoding for the "surrogateescape" error handler, except on Android. Previously, "Py_DecodeLocale()" was used for the "surrogateescape", and the current locale encoding was used for "strict". PyObject *PyUnicode_DecodeLocale(const char *str, const char *errors) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Similar to "PyUnicode_DecodeLocaleAndSize()", but compute the string length using "strlen()". Added in version 3.3. PyObject *PyUnicode_EncodeLocale(PyObject *unicode, const char *errors) *Return value: New reference.** Part of the Stable ABI since version 3.7.* Encode a Unicode object to UTF-8 on Android and VxWorks, or to the current locale encoding on other platforms. The supported error handlers are ""strict"" and ""surrogateescape"" (**PEP 383**). The encoder uses ""strict"" error handler if *errors* is "NULL". Return a "bytes" object. *unicode* cannot contain embedded null characters. Use "PyUnicode_EncodeFSDefault()" to encode a string to the *filesystem encoding and error handler*. This function ignores the Python UTF-8 Mode. See also: The "Py_EncodeLocale()" function. Added in version 3.3. Changed in version 3.7: The function now also uses the current locale encoding for the "surrogateescape" error handler, except on Android. Previously, "Py_EncodeLocale()" was used for the "surrogateescape", and the current locale encoding was used for "strict". File System Encoding -------------------- Functions encoding to and decoding from the *filesystem encoding and error handler* (**PEP 383** and **PEP 529**). To encode file names to "bytes" during argument parsing, the ""O&"" converter should be used, passing "PyUnicode_FSConverter()" as the conversion function: int PyUnicode_FSConverter(PyObject *obj, void *result) * Part of the Stable ABI.* PyArg_Parse* converter: encode "str" objects – obtained directly or through the "os.PathLike" interface – to "bytes" using "PyUnicode_EncodeFSDefault()"; "bytes" objects are output as-is. *result* must be an address of a C variable of type PyObject* (or PyBytesObject*). On success, set the variable to a new *strong reference* to a bytes object which must be released when it is no longer used and return a non-zero value ("Py_CLEANUP_SUPPORTED"). Embedded null bytes are not allowed in the result. On failure, return "0" with an exception set. If *obj* is "NULL", the function releases a strong reference stored in the variable referred by *result* and returns "1". Added in version 3.1. Changed in version 3.6: Accepts a *path-like object*. To decode file names to "str" during argument parsing, the ""O&"" converter should be used, passing "PyUnicode_FSDecoder()" as the conversion function: int PyUnicode_FSDecoder(PyObject *obj, void *result) * Part of the Stable ABI.* PyArg_Parse* converter: decode "bytes" objects – obtained either directly or indirectly through the "os.PathLike" interface – to "str" using "PyUnicode_DecodeFSDefaultAndSize()"; "str" objects are output as-is. *result* must be an address of a C variable of type PyObject* (or PyUnicodeObject*). On success, set the variable to a new *strong reference* to a Unicode object which must be released when it is no longer used and return a non-zero value ("Py_CLEANUP_SUPPORTED"). Embedded null characters are not allowed in the result. On failure, return "0" with an exception set. If *obj* is "NULL", release the strong reference to the object referred to by *result* and return "1". Added in version 3.2. Changed in version 3.6: Accepts a *path-like object*. PyObject *PyUnicode_DecodeFSDefaultAndSize(const char *str, Py_ssize_t size) *Return value: New reference.** Part of the Stable ABI.* Decode a string from the *filesystem encoding and error handler*. If you need to decode a string from the current locale encoding, use "PyUnicode_DecodeLocaleAndSize()". See also: The "Py_DecodeLocale()" function. Changed in version 3.6: The *filesystem error handler* is now used. PyObject *PyUnicode_DecodeFSDefault(const char *str) *Return value: New reference.** Part of the Stable ABI.* Decode a null-terminated string from the *filesystem encoding and error handler*. If the string length is known, use "PyUnicode_DecodeFSDefaultAndSize()". Changed in version 3.6: The *filesystem error handler* is now used. PyObject *PyUnicode_EncodeFSDefault(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object to the *filesystem encoding and error handler*, and return "bytes". Note that the resulting "bytes" object can contain null bytes. If you need to encode a string to the current locale encoding, use "PyUnicode_EncodeLocale()". See also: The "Py_EncodeLocale()" function. Added in version 3.2. Changed in version 3.6: The *filesystem error handler* is now used. wchar_t Support --------------- "wchar_t" support for platforms which support it: PyObject *PyUnicode_FromWideChar(const wchar_t *wstr, Py_ssize_t size) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object from the "wchar_t" buffer *wstr* of the given *size*. Passing "-1" as the *size* indicates that the function must itself compute the length, using "wcslen()". Return "NULL" on failure. Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *wstr, Py_ssize_t size) * Part of the Stable ABI.* Copy the Unicode object contents into the "wchar_t" buffer *wstr*. At most *size* "wchar_t" characters are copied (excluding a possibly trailing null termination character). Return the number of "wchar_t" characters copied or "-1" in case of an error. When *wstr* is "NULL", instead return the *size* that would be required to store all of *unicode* including a terminating null. Note that the resulting wchar_t* string may or may not be null- terminated. It is the responsibility of the caller to make sure that the wchar_t* string is null-terminated in case this is required by the application. Also, note that the wchar_t* string might contain null characters, which would cause the string to be truncated when used with most C functions. wchar_t *PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size) * Part of the Stable ABI since version 3.7.* Convert the Unicode object to a wide character string. The output string always ends with a null character. If *size* is not "NULL", write the number of wide characters (excluding the trailing null termination character) into **size*. Note that the resulting "wchar_t" string might contain null characters, which would cause the string to be truncated when used with most C functions. If *size* is "NULL" and the wchar_t* string contains null characters a "ValueError" is raised. Returns a buffer allocated by "PyMem_New" (use "PyMem_Free()" to free it) on success. On error, returns "NULL" and **size* is undefined. Raises a "MemoryError" if memory allocation is failed. Added in version 3.2. Changed in version 3.7: Raises a "ValueError" if *size* is "NULL" and the wchar_t* string contains null characters. Built-in Codecs =============== Python provides a set of built-in codecs which are written in C for speed. All of these codecs are directly usable via the following functions. Many of the following APIs take two arguments encoding and errors, and they have the same semantics as the ones of the built-in "str()" string object constructor. Setting encoding to "NULL" causes the default encoding to be used which is UTF-8. The file system calls should use "PyUnicode_FSConverter()" for encoding file names. This uses the *filesystem encoding and error handler* internally. Error handling is set by errors which may also be set to "NULL" meaning to use the default handling defined for the codec. Default error handling for all built-in codecs is “strict” ("ValueError" is raised). The codecs all use a similar interface. Only deviations from the following generic ones are documented for simplicity. Generic Codecs -------------- These are the generic codec APIs: PyObject *PyUnicode_Decode(const char *str, Py_ssize_t size, const char *encoding, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the encoded string *str*. *encoding* and *errors* have the same meaning as the parameters of the same name in the "str()" built-in function. The codec to be used is looked up using the Python codec registry. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object and return the result as Python bytes object. *encoding* and *errors* have the same meaning as the parameters of the same name in the Unicode "encode()" method. The codec to be used is looked up using the Python codec registry. Return "NULL" if an exception was raised by the codec. UTF-8 Codecs ------------ These are the UTF-8 codec APIs: PyObject *PyUnicode_DecodeUTF8(const char *str, Py_ssize_t size, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string *str*. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_DecodeUTF8Stateful(const char *str, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) *Return value: New reference.** Part of the Stable ABI.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF8()". If *consumed* is not "NULL", trailing incomplete UTF-8 byte sequences will not be treated as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. PyObject *PyUnicode_AsUTF8String(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object using UTF-8 and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. The function fails if the string contains surrogate code points ("U+D800" - "U+DFFF"). const char *PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size) * Part of the Stable ABI since version 3.10.* Return a pointer to the UTF-8 encoding of the Unicode object, and store the size of the encoded representation (in bytes) in *size*. The *size* argument can be "NULL"; in this case no size will be stored. The returned buffer always has an extra null byte appended (not included in *size*), regardless of whether there are any other null code points. On error, set an exception, set *size* to "-1" (if it’s not NULL) and return "NULL". The function fails if the string contains surrogate code points ("U+D800" - "U+DFFF"). This caches the UTF-8 representation of the string in the Unicode object, and subsequent calls will return a pointer to the same buffer. The caller is not responsible for deallocating the buffer. The buffer is deallocated and pointers to it become invalid when the Unicode object is garbage collected. Added in version 3.3. Changed in version 3.7: The return type is now "const char *" rather of "char *". Changed in version 3.10: This function is a part of the limited API. const char *PyUnicode_AsUTF8(PyObject *unicode) As "PyUnicode_AsUTF8AndSize()", but does not store the size. Warning: This function does not have any special behavior for null characters embedded within *unicode*. As a result, strings containing null characters will remain in the returned string, which some C functions might interpret as the end of the string, leading to truncation. If truncation is an issue, it is recommended to use "PyUnicode_AsUTF8AndSize()" instead. Added in version 3.3. Changed in version 3.7: The return type is now "const char *" rather of "char *". UTF-32 Codecs ------------- These are the UTF-32 codec APIs: PyObject *PyUnicode_DecodeUTF32(const char *str, Py_ssize_t size, const char *errors, int *byteorder) *Return value: New reference.** Part of the Stable ABI.* Decode *size* bytes from a UTF-32 encoded buffer string and return the corresponding Unicode object. *errors* (if non-"NULL") defines the error handling. It defaults to “strict”. If *byteorder* is non-"NULL", the decoder starts decoding using the given byte order: *byteorder == -1: little endian *byteorder == 0: native order *byteorder == 1: big endian If "*byteorder" is zero, and the first four bytes of the input data are a byte order mark (BOM), the decoder switches to this byte order and the BOM is not copied into the resulting Unicode string. If "*byteorder" is "-1" or "1", any byte order mark is copied to the output. After completion, **byteorder* is set to the current byte order at the end of input data. If *byteorder* is "NULL", the codec starts in native order mode. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_DecodeUTF32Stateful(const char *str, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed) *Return value: New reference.** Part of the Stable ABI.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF32()". If *consumed* is not "NULL", "PyUnicode_DecodeUTF32Stateful()" will not treat trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible by four) as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. PyObject *PyUnicode_AsUTF32String(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Return a Python byte string using the UTF-32 encoding in native byte order. The string always starts with a BOM mark. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. UTF-16 Codecs ------------- These are the UTF-16 codec APIs: PyObject *PyUnicode_DecodeUTF16(const char *str, Py_ssize_t size, const char *errors, int *byteorder) *Return value: New reference.** Part of the Stable ABI.* Decode *size* bytes from a UTF-16 encoded buffer string and return the corresponding Unicode object. *errors* (if non-"NULL") defines the error handling. It defaults to “strict”. If *byteorder* is non-"NULL", the decoder starts decoding using the given byte order: *byteorder == -1: little endian *byteorder == 0: native order *byteorder == 1: big endian If "*byteorder" is zero, and the first two bytes of the input data are a byte order mark (BOM), the decoder switches to this byte order and the BOM is not copied into the resulting Unicode string. If "*byteorder" is "-1" or "1", any byte order mark is copied to the output (where it will result in either a "\ufeff" or a "\ufffe" character). After completion, "*byteorder" is set to the current byte order at the end of input data. If *byteorder* is "NULL", the codec starts in native order mode. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_DecodeUTF16Stateful(const char *str, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed) *Return value: New reference.** Part of the Stable ABI.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF16()". If *consumed* is not "NULL", "PyUnicode_DecodeUTF16Stateful()" will not treat trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a split surrogate pair) as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. PyObject *PyUnicode_AsUTF16String(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Return a Python byte string using the UTF-16 encoding in native byte order. The string always starts with a BOM mark. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. UTF-7 Codecs ------------ These are the UTF-7 codec APIs: PyObject *PyUnicode_DecodeUTF7(const char *str, Py_ssize_t size, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string *str*. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_DecodeUTF7Stateful(const char *str, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) *Return value: New reference.** Part of the Stable ABI.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF7()". If *consumed* is not "NULL", trailing incomplete UTF-7 base-64 sections will not be treated as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. Unicode-Escape Codecs --------------------- These are the “Unicode Escape” codec APIs: PyObject *PyUnicode_DecodeUnicodeEscape(const char *str, Py_ssize_t size, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the Unicode- Escape encoded string *str*. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_AsUnicodeEscapeString(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object using Unicode-Escape and return the result as a bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. Raw-Unicode-Escape Codecs ------------------------- These are the “Raw Unicode Escape” codec APIs: PyObject *PyUnicode_DecodeRawUnicodeEscape(const char *str, Py_ssize_t size, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the Raw- Unicode-Escape encoded string *str*. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object using Raw-Unicode-Escape and return the result as a bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. Latin-1 Codecs -------------- These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode ordinals and only these are accepted by the codecs during encoding. PyObject *PyUnicode_DecodeLatin1(const char *str, Py_ssize_t size, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string *str*. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_AsLatin1String(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object using Latin-1 and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. ASCII Codecs ------------ These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other codes generate errors. PyObject *PyUnicode_DecodeASCII(const char *str, Py_ssize_t size, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the ASCII encoded string *str*. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_AsASCIIString(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object using ASCII and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. Character Map Codecs -------------------- This codec is special in that it can be used to implement many different codecs (and this is in fact what was done to obtain most of the standard codecs included in the "encodings" package). The codec uses mappings to encode and decode characters. The mapping objects provided must support the "__getitem__()" mapping interface; dictionaries and sequences work well. These are the mapping codec APIs: PyObject *PyUnicode_DecodeCharmap(const char *str, Py_ssize_t length, PyObject *mapping, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Create a Unicode object by decoding *size* bytes of the encoded string *str* using the given *mapping* object. Return "NULL" if an exception was raised by the codec. If *mapping* is "NULL", Latin-1 decoding will be applied. Else *mapping* must map bytes ordinals (integers in the range from 0 to 255) to Unicode strings, integers (which are then interpreted as Unicode ordinals) or "None". Unmapped data bytes – ones which cause a "LookupError", as well as ones which get mapped to "None", "0xFFFE" or "'\ufffe'", are treated as undefined mappings and cause an error. PyObject *PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping) *Return value: New reference.** Part of the Stable ABI.* Encode a Unicode object using the given *mapping* object and return the result as a bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. The *mapping* object must map Unicode ordinal integers to bytes objects, integers in the range from 0 to 255 or "None". Unmapped character ordinals (ones which cause a "LookupError") as well as mapped to "None" are treated as “undefined mapping” and cause an error. The following codec API is special in that maps Unicode to Unicode. PyObject *PyUnicode_Translate(PyObject *unicode, PyObject *table, const char *errors) *Return value: New reference.** Part of the Stable ABI.* Translate a string by applying a character mapping table to it and return the resulting Unicode object. Return "NULL" if an exception was raised by the codec. The mapping table must map Unicode ordinal integers to Unicode ordinal integers or "None" (causing deletion of the character). Mapping tables need only provide the "__getitem__()" interface; dictionaries and sequences work well. Unmapped character ordinals (ones which cause a "LookupError") are left untouched and are copied as-is. *errors* has the usual meaning for codecs. It may be "NULL" which indicates to use the default error handling. MBCS codecs for Windows ----------------------- These are the MBCS codec APIs. They are currently only available on Windows and use the Win32 MBCS converters to implement the conversions. Note that MBCS (or DBCS) is a class of encodings, not just one. The target encoding is defined by the user settings on the machine running the codec. PyObject *PyUnicode_DecodeMBCS(const char *str, Py_ssize_t size, const char *errors) *Return value: New reference.** Part of the Stable ABI on Windows since version 3.7.* Create a Unicode object by decoding *size* bytes of the MBCS encoded string *str*. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_DecodeMBCSStateful(const char *str, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) *Return value: New reference.** Part of the Stable ABI on Windows since version 3.7.* If *consumed* is "NULL", behave like "PyUnicode_DecodeMBCS()". If *consumed* is not "NULL", "PyUnicode_DecodeMBCSStateful()" will not decode trailing lead byte and the number of bytes that have been decoded will be stored in *consumed*. PyObject *PyUnicode_DecodeCodePageStateful(int code_page, const char *str, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) *Return value: New reference.** Part of the Stable ABI on Windows since version 3.7.* Similar to "PyUnicode_DecodeMBCSStateful()", except uses the code page specified by *code_page*. PyObject *PyUnicode_AsMBCSString(PyObject *unicode) *Return value: New reference.** Part of the Stable ABI on Windows since version 3.7.* Encode a Unicode object using MBCS and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject *PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *errors) *Return value: New reference.** Part of the Stable ABI on Windows since version 3.7.* Encode the Unicode object using the specified code page and return a Python bytes object. Return "NULL" if an exception was raised by the codec. Use "CP_ACP" code page to get the MBCS encoder. Added in version 3.3. Methods and Slot Functions ========================== The following APIs are capable of handling Unicode objects and strings on input (we refer to them as strings in the descriptions) and return Unicode objects or integers as appropriate. They all return "NULL" or "-1" if an exception occurs. PyObject *PyUnicode_Concat(PyObject *left, PyObject *right) *Return value: New reference.** Part of the Stable ABI.* Concat two strings giving a new Unicode string. PyObject *PyUnicode_Split(PyObject *unicode, PyObject *sep, Py_ssize_t maxsplit) *Return value: New reference.** Part of the Stable ABI.* Split a string giving a list of Unicode strings. If *sep* is "NULL", splitting will be done at all whitespace substrings. Otherwise, splits occur at the given separator. At most *maxsplit* splits will be done. If negative, no limit is set. Separators are not included in the resulting list. On error, return "NULL" with an exception set. Equivalent to "str.split()". PyObject *PyUnicode_RSplit(PyObject *unicode, PyObject *sep, Py_ssize_t maxsplit) *Return value: New reference.** Part of the Stable ABI.* Similar to "PyUnicode_Split()", but splitting will be done beginning at the end of the string. On error, return "NULL" with an exception set. Equivalent to "str.rsplit()". PyObject *PyUnicode_Splitlines(PyObject *unicode, int keepends) *Return value: New reference.** Part of the Stable ABI.* Split a Unicode string at line breaks, returning a list of Unicode strings. CRLF is considered to be one line break. If *keepends* is "0", the Line break characters are not included in the resulting strings. PyObject *PyUnicode_Partition(PyObject *unicode, PyObject *sep) *Return value: New reference.** Part of the Stable ABI.* Split a Unicode string at the first occurrence of *sep*, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings. *sep* must not be empty. On error, return "NULL" with an exception set. Equivalent to "str.partition()". PyObject *PyUnicode_RPartition(PyObject *unicode, PyObject *sep) *Return value: New reference.** Part of the Stable ABI.* Similar to "PyUnicode_Partition()", but split a Unicode string at the last occurrence of *sep*. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself. *sep* must not be empty. On error, return "NULL" with an exception set. Equivalent to "str.rpartition()". PyObject *PyUnicode_Join(PyObject *separator, PyObject *seq) *Return value: New reference.** Part of the Stable ABI.* Join a sequence of strings using the given *separator* and return the resulting Unicode string. Py_ssize_t PyUnicode_Tailmatch(PyObject *unicode, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction) * Part of the Stable ABI.* Return "1" if *substr* matches "unicode[start:end]" at the given tail end (*direction* == "-1" means to do a prefix match, *direction* == "1" a suffix match), "0" otherwise. Return "-1" if an error occurred. Py_ssize_t PyUnicode_Find(PyObject *unicode, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction) * Part of the Stable ABI.* Return the first position of *substr* in "unicode[start:end]" using the given *direction* (*direction* == "1" means to do a forward search, *direction* == "-1" a backward search). The return value is the index of the first match; a value of "-1" indicates that no match was found, and "-2" indicates that an error occurred and an exception has been set. Py_ssize_t PyUnicode_FindChar(PyObject *unicode, Py_UCS4 ch, Py_ssize_t start, Py_ssize_t end, int direction) * Part of the Stable ABI since version 3.7.* Return the first position of the character *ch* in "unicode[start:end]" using the given *direction* (*direction* == "1" means to do a forward search, *direction* == "-1" a backward search). The return value is the index of the first match; a value of "-1" indicates that no match was found, and "-2" indicates that an error occurred and an exception has been set. Added in version 3.3. Changed in version 3.7: *start* and *end* are now adjusted to behave like "unicode[start:end]". Py_ssize_t PyUnicode_Count(PyObject *unicode, PyObject *substr, Py_ssize_t start, Py_ssize_t end) * Part of the Stable ABI.* Return the number of non-overlapping occurrences of *substr* in "unicode[start:end]". Return "-1" if an error occurred. PyObject *PyUnicode_Replace(PyObject *unicode, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount) *Return value: New reference.** Part of the Stable ABI.* Replace at most *maxcount* occurrences of *substr* in *unicode* with *replstr* and return the resulting Unicode object. *maxcount* == "-1" means replace all occurrences. int PyUnicode_Compare(PyObject *left, PyObject *right) * Part of the Stable ABI.* Compare two strings and return "-1", "0", "1" for less than, equal, and greater than, respectively. This function returns "-1" upon failure, so one should call "PyErr_Occurred()" to check for errors. int PyUnicode_EqualToUTF8AndSize(PyObject *unicode, const char *string, Py_ssize_t size) * Part of the Stable ABI since version 3.13.* Compare a Unicode object with a char buffer which is interpreted as being UTF-8 or ASCII encoded and return true ("1") if they are equal, or false ("0") otherwise. If the Unicode object contains surrogate code points ("U+D800" - "U+DFFF") or the C string is not valid UTF-8, false ("0") is returned. This function does not raise exceptions. Added in version 3.13. int PyUnicode_EqualToUTF8(PyObject *unicode, const char *string) * Part of the Stable ABI since version 3.13.* Similar to "PyUnicode_EqualToUTF8AndSize()", but compute *string* length using "strlen()". If the Unicode object contains null characters, false ("0") is returned. Added in version 3.13. int PyUnicode_CompareWithASCIIString(PyObject *unicode, const char *string) * Part of the Stable ABI.* Compare a Unicode object, *unicode*, with *string* and return "-1", "0", "1" for less than, equal, and greater than, respectively. It is best to pass only ASCII-encoded strings, but the function interprets the input string as ISO-8859-1 if it contains non-ASCII characters. This function does not raise exceptions. PyObject *PyUnicode_RichCompare(PyObject *left, PyObject *right, int op) *Return value: New reference.** Part of the Stable ABI.* Rich compare two Unicode strings and return one of the following: * "NULL" in case an exception was raised * "Py_True" or "Py_False" for successful comparisons * "Py_NotImplemented" in case the type combination is unknown Possible values for *op* are "Py_GT", "Py_GE", "Py_EQ", "Py_NE", "Py_LT", and "Py_LE". PyObject *PyUnicode_Format(PyObject *format, PyObject *args) *Return value: New reference.** Part of the Stable ABI.* Return a new string object from *format* and *args*; this is analogous to "format % args". int PyUnicode_Contains(PyObject *unicode, PyObject *substr) * Part of the Stable ABI.* Check whether *substr* is contained in *unicode* and return true or false accordingly. *substr* has to coerce to a one element Unicode string. "-1" is returned if there was an error. void PyUnicode_InternInPlace(PyObject **p_unicode) * Part of the Stable ABI.* Intern the argument *p_unicode in place. The argument must be the address of a pointer variable pointing to a Python Unicode string object. If there is an existing interned string that is the same as *p_unicode, it sets *p_unicode to it (releasing the reference to the old string object and creating a new *strong reference* to the interned string object), otherwise it leaves *p_unicode alone and interns it. (Clarification: even though there is a lot of talk about references, think of this function as reference-neutral. You must own the object you pass in; after the call you no longer own the passed-in reference, but you newly own the result.) This function never raises an exception. On error, it leaves its argument unchanged without interning it. Instances of subclasses of "str" may not be interned, that is, PyUnicode_CheckExact(*p_unicode) must be true. If it is not, then – as with any other error – the argument is left unchanged. Note that interned strings are not “immortal”. You must keep a reference to the result to benefit from interning. PyObject *PyUnicode_InternFromString(const char *str) *Return value: New reference.** Part of the Stable ABI.* A combination of "PyUnicode_FromString()" and "PyUnicode_InternInPlace()", meant for statically allocated strings. Return a new (“owned”) reference to either a new Unicode string object that has been interned, or an earlier interned string object with the same value. Python may keep a reference to the result, or make it *immortal*, preventing it from being garbage-collected promptly. For interning an unbounded number of different strings, such as ones coming from user input, prefer calling "PyUnicode_FromString()" and "PyUnicode_InternInPlace()" directly. Utilities ********* The functions in this chapter perform various utility tasks, ranging from helping C code be more portable across platforms, using Python modules from C, and parsing function arguments and constructing Python values from C values. * Operating System Utilities * System Functions * Process Control * Importing Modules * Data marshalling support * Parsing arguments and building values * Parsing arguments * Strings and buffers * Numbers * Other objects * API Functions * Building values * String conversion and formatting * PyHash API * Reflection * Codec registry and support functions * Codec lookup API * Registry API for Unicode encoding error handlers * PyTime C API * Types * Clock Functions * Raw Clock Functions * Conversion functions * Support for Perf Maps The Very High Level Layer ************************* The functions in this chapter will let you execute Python source code given in a file or a buffer, but they will not let you interact in a more detailed way with the interpreter. Several of these functions accept a start symbol from the grammar as a parameter. The available start symbols are "Py_eval_input", "Py_file_input", and "Py_single_input". These are described following the functions which accept them as parameters. Note also that several of these functions take FILE* parameters. One particular issue which needs to be handled carefully is that the "FILE" structure for different C libraries can be different and incompatible. Under Windows (at least), it is possible for dynamically linked extensions to actually use different libraries, so care should be taken that FILE* parameters are only passed to these functions if it is certain that they were created by the same library that the Python runtime is using. int PyRun_AnyFile(FILE *fp, const char *filename) This is a simplified interface to "PyRun_AnyFileExFlags()" below, leaving *closeit* set to "0" and *flags* set to "NULL". int PyRun_AnyFileFlags(FILE *fp, const char *filename, PyCompilerFlags *flags) This is a simplified interface to "PyRun_AnyFileExFlags()" below, leaving the *closeit* argument set to "0". int PyRun_AnyFileEx(FILE *fp, const char *filename, int closeit) This is a simplified interface to "PyRun_AnyFileExFlags()" below, leaving the *flags* argument set to "NULL". int PyRun_AnyFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) If *fp* refers to a file associated with an interactive device (console or terminal input or Unix pseudo-terminal), return the value of "PyRun_InteractiveLoop()", otherwise return the result of "PyRun_SimpleFile()". *filename* is decoded from the filesystem encoding ("sys.getfilesystemencoding()"). If *filename* is "NULL", this function uses ""???"" as the filename. If *closeit* is true, the file is closed before "PyRun_SimpleFileExFlags()" returns. int PyRun_SimpleString(const char *command) This is a simplified interface to "PyRun_SimpleStringFlags()" below, leaving the "PyCompilerFlags"* argument set to "NULL". int PyRun_SimpleStringFlags(const char *command, PyCompilerFlags *flags) Executes the Python source code from *command* in the "__main__" module according to the *flags* argument. If "__main__" does not already exist, it is created. Returns "0" on success or "-1" if an exception was raised. If there was an error, there is no way to get the exception information. For the meaning of *flags*, see below. Note that if an otherwise unhandled "SystemExit" is raised, this function will not return "-1", but exit the process, as long as "PyConfig.inspect" is zero. int PyRun_SimpleFile(FILE *fp, const char *filename) This is a simplified interface to "PyRun_SimpleFileExFlags()" below, leaving *closeit* set to "0" and *flags* set to "NULL". int PyRun_SimpleFileEx(FILE *fp, const char *filename, int closeit) This is a simplified interface to "PyRun_SimpleFileExFlags()" below, leaving *flags* set to "NULL". int PyRun_SimpleFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) Similar to "PyRun_SimpleStringFlags()", but the Python source code is read from *fp* instead of an in-memory string. *filename* should be the name of the file, it is decoded from *filesystem encoding and error handler*. If *closeit* is true, the file is closed before "PyRun_SimpleFileExFlags()" returns. Note: On Windows, *fp* should be opened as binary mode (e.g. "fopen(filename, "rb")"). Otherwise, Python may not handle script file with LF line ending correctly. int PyRun_InteractiveOne(FILE *fp, const char *filename) This is a simplified interface to "PyRun_InteractiveOneFlags()" below, leaving *flags* set to "NULL". int PyRun_InteractiveOneFlags(FILE *fp, const char *filename, PyCompilerFlags *flags) Read and execute a single statement from a file associated with an interactive device according to the *flags* argument. The user will be prompted using "sys.ps1" and "sys.ps2". *filename* is decoded from the *filesystem encoding and error handler*. Returns "0" when the input was executed successfully, "-1" if there was an exception, or an error code from the "errcode.h" include file distributed as part of Python if there was a parse error. (Note that "errcode.h" is not included by "Python.h", so must be included specifically if needed.) int PyRun_InteractiveLoop(FILE *fp, const char *filename) This is a simplified interface to "PyRun_InteractiveLoopFlags()" below, leaving *flags* set to "NULL". int PyRun_InteractiveLoopFlags(FILE *fp, const char *filename, PyCompilerFlags *flags) Read and execute statements from a file associated with an interactive device until EOF is reached. The user will be prompted using "sys.ps1" and "sys.ps2". *filename* is decoded from the *filesystem encoding and error handler*. Returns "0" at EOF or a negative number upon failure. int (*PyOS_InputHook)(void) * Part of the Stable ABI.* Can be set to point to a function with the prototype "int func(void)". The function will be called when Python’s interpreter prompt is about to become idle and wait for user input from the terminal. The return value is ignored. Overriding this hook can be used to integrate the interpreter’s prompt with other event loops, as done in the "Modules/_tkinter.c" in the Python source code. Changed in version 3.12: This function is only called from the main interpreter. char *(*PyOS_ReadlineFunctionPointer)(FILE*, FILE*, const char*) Can be set to point to a function with the prototype "char *func(FILE *stdin, FILE *stdout, char *prompt)", overriding the default function used to read a single line of input at the interpreter’s prompt. The function is expected to output the string *prompt* if it’s not "NULL", and then read a line of input from the provided standard input file, returning the resulting string. For example, The "readline" module sets this hook to provide line-editing and tab-completion features. The result must be a string allocated by "PyMem_RawMalloc()" or "PyMem_RawRealloc()", or "NULL" if an error occurred. Changed in version 3.4: The result must be allocated by "PyMem_RawMalloc()" or "PyMem_RawRealloc()", instead of being allocated by "PyMem_Malloc()" or "PyMem_Realloc()". Changed in version 3.12: This function is only called from the main interpreter. PyObject *PyRun_String(const char *str, int start, PyObject *globals, PyObject *locals) *Return value: New reference.* This is a simplified interface to "PyRun_StringFlags()" below, leaving *flags* set to "NULL". PyObject *PyRun_StringFlags(const char *str, int start, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) *Return value: New reference.* Execute Python source code from *str* in the context specified by the objects *globals* and *locals* with the compiler flags specified by *flags*. *globals* must be a dictionary; *locals* can be any object that implements the mapping protocol. The parameter *start* specifies the start token that should be used to parse the source code. Returns the result of executing the code as a Python object, or "NULL" if an exception was raised. PyObject *PyRun_File(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals) *Return value: New reference.* This is a simplified interface to "PyRun_FileExFlags()" below, leaving *closeit* set to "0" and *flags* set to "NULL". PyObject *PyRun_FileEx(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals, int closeit) *Return value: New reference.* This is a simplified interface to "PyRun_FileExFlags()" below, leaving *flags* set to "NULL". PyObject *PyRun_FileFlags(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) *Return value: New reference.* This is a simplified interface to "PyRun_FileExFlags()" below, leaving *closeit* set to "0". PyObject *PyRun_FileExFlags(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals, int closeit, PyCompilerFlags *flags) *Return value: New reference.* Similar to "PyRun_StringFlags()", but the Python source code is read from *fp* instead of an in-memory string. *filename* should be the name of the file, it is decoded from the *filesystem encoding and error handler*. If *closeit* is true, the file is closed before "PyRun_FileExFlags()" returns. PyObject *Py_CompileString(const char *str, const char *filename, int start) *Return value: New reference.** Part of the Stable ABI.* This is a simplified interface to "Py_CompileStringFlags()" below, leaving *flags* set to "NULL". PyObject *Py_CompileStringFlags(const char *str, const char *filename, int start, PyCompilerFlags *flags) *Return value: New reference.* This is a simplified interface to "Py_CompileStringExFlags()" below, with *optimize* set to "-1". PyObject *Py_CompileStringObject(const char *str, PyObject *filename, int start, PyCompilerFlags *flags, int optimize) *Return value: New reference.* Parse and compile the Python source code in *str*, returning the resulting code object. The start token is given by *start*; this can be used to constrain the code which can be compiled and should be "Py_eval_input", "Py_file_input", or "Py_single_input". The filename specified by *filename* is used to construct the code object and may appear in tracebacks or "SyntaxError" exception messages. This returns "NULL" if the code cannot be parsed or compiled. The integer *optimize* specifies the optimization level of the compiler; a value of "-1" selects the optimization level of the interpreter as given by "-O" options. Explicit levels are "0" (no optimization; "__debug__" is true), "1" (asserts are removed, "__debug__" is false) or "2" (docstrings are removed too). Added in version 3.4. PyObject *Py_CompileStringExFlags(const char *str, const char *filename, int start, PyCompilerFlags *flags, int optimize) *Return value: New reference.* Like "Py_CompileStringObject()", but *filename* is a byte string decoded from the *filesystem encoding and error handler*. Added in version 3.2. PyObject *PyEval_EvalCode(PyObject *co, PyObject *globals, PyObject *locals) *Return value: New reference.** Part of the Stable ABI.* This is a simplified interface to "PyEval_EvalCodeEx()", with just the code object, and global and local variables. The other arguments are set to "NULL". PyObject *PyEval_EvalCodeEx(PyObject *co, PyObject *globals, PyObject *locals, PyObject *const *args, int argcount, PyObject *const *kws, int kwcount, PyObject *const *defs, int defcount, PyObject *kwdefs, PyObject *closure) *Return value: New reference.** Part of the Stable ABI.* Evaluate a precompiled code object, given a particular environment for its evaluation. This environment consists of a dictionary of global variables, a mapping object of local variables, arrays of arguments, keywords and defaults, a dictionary of default values for keyword-only arguments and a closure tuple of cells. PyObject *PyEval_EvalFrame(PyFrameObject *f) *Return value: New reference.** Part of the Stable ABI.* Evaluate an execution frame. This is a simplified interface to "PyEval_EvalFrameEx()", for backward compatibility. PyObject *PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) *Return value: New reference.** Part of the Stable ABI.* This is the main, unvarnished function of Python interpretation. The code object associated with the execution frame *f* is executed, interpreting bytecode and executing calls as needed. The additional *throwflag* parameter can mostly be ignored - if true, then it causes an exception to immediately be thrown; this is used for the "throw()" methods of generator objects. Changed in version 3.4: This function now includes a debug assertion to help ensure that it does not silently discard an active exception. int PyEval_MergeCompilerFlags(PyCompilerFlags *cf) This function changes the flags of the current evaluation frame, and returns true on success, false on failure. int Py_eval_input The start symbol from the Python grammar for isolated expressions; for use with "Py_CompileString()". int Py_file_input The start symbol from the Python grammar for sequences of statements as read from a file or other source; for use with "Py_CompileString()". This is the symbol to use when compiling arbitrarily long Python source code. int Py_single_input The start symbol from the Python grammar for a single statement; for use with "Py_CompileString()". This is the symbol used for the interactive interpreter loop. struct PyCompilerFlags This is the structure used to hold compiler flags. In cases where code is only being compiled, it is passed as "int flags", and in cases where code is being executed, it is passed as "PyCompilerFlags *flags". In this case, "from __future__ import" can modify *flags*. Whenever "PyCompilerFlags *flags" is "NULL", "cf_flags" is treated as equal to "0", and any modification due to "from __future__ import" is discarded. int cf_flags Compiler flags. int cf_feature_version *cf_feature_version* is the minor Python version. It should be initialized to "PY_MINOR_VERSION". The field is ignored by default, it is used if and only if "PyCF_ONLY_AST" flag is set in "cf_flags". Changed in version 3.8: Added *cf_feature_version* field. The available compiler flags are accessible as macros: PyCF_ALLOW_TOP_LEVEL_AWAIT PyCF_ONLY_AST PyCF_OPTIMIZED_AST PyCF_TYPE_COMMENTS See compiler flags in documentation of the "ast" Python module, which exports these constants under the same names. int CO_FUTURE_DIVISION This bit can be set in *flags* to cause division operator "/" to be interpreted as “true division” according to **PEP 238**. Weak Reference Objects ********************** Python supports *weak references* as first-class objects. There are two specific object types which directly implement weak references. The first is a simple reference object, and the second acts as a proxy for the original object as much as it can. int PyWeakref_Check(PyObject *ob) Return non-zero if *ob* is either a reference or proxy object. This function always succeeds. int PyWeakref_CheckRef(PyObject *ob) Return non-zero if *ob* is a reference object. This function always succeeds. int PyWeakref_CheckProxy(PyObject *ob) Return non-zero if *ob* is a proxy object. This function always succeeds. PyObject *PyWeakref_NewRef(PyObject *ob, PyObject *callback) *Return value: New reference.** Part of the Stable ABI.* Return a weak reference object for the object *ob*. This will always return a new reference, but is not guaranteed to create a new object; an existing reference object may be returned. The second parameter, *callback*, can be a callable object that receives notification when *ob* is garbage collected; it should accept a single parameter, which will be the weak reference object itself. *callback* may also be "None" or "NULL". If *ob* is not a weakly referenceable object, or if *callback* is not callable, "None", or "NULL", this will return "NULL" and raise "TypeError". PyObject *PyWeakref_NewProxy(PyObject *ob, PyObject *callback) *Return value: New reference.** Part of the Stable ABI.* Return a weak reference proxy object for the object *ob*. This will always return a new reference, but is not guaranteed to create a new object; an existing proxy object may be returned. The second parameter, *callback*, can be a callable object that receives notification when *ob* is garbage collected; it should accept a single parameter, which will be the weak reference object itself. *callback* may also be "None" or "NULL". If *ob* is not a weakly referenceable object, or if *callback* is not callable, "None", or "NULL", this will return "NULL" and raise "TypeError". int PyWeakref_GetRef(PyObject *ref, PyObject **pobj) * Part of the Stable ABI since version 3.13.* Get a *strong reference* to the referenced object from a weak reference, *ref*, into **pobj*. * On success, set **pobj* to a new *strong reference* to the referenced object and return 1. * If the reference is dead, set **pobj* to "NULL" and return 0. * On error, raise an exception and return -1. Added in version 3.13. PyObject *PyWeakref_GetObject(PyObject *ref) *Return value: Borrowed reference.** Part of the Stable ABI.* Return a *borrowed reference* to the referenced object from a weak reference, *ref*. If the referent is no longer live, returns "Py_None". Note: This function returns a *borrowed reference* to the referenced object. This means that you should always call "Py_INCREF()" on the object except when it cannot be destroyed before the last usage of the borrowed reference. Deprecated since version 3.13, will be removed in version 3.15: Use "PyWeakref_GetRef()" instead. PyObject *PyWeakref_GET_OBJECT(PyObject *ref) *Return value: Borrowed reference.* Similar to "PyWeakref_GetObject()", but does no error checking. Deprecated since version 3.13, will be removed in version 3.15: Use "PyWeakref_GetRef()" instead. void PyObject_ClearWeakRefs(PyObject *object) * Part of the Stable ABI.* This function is called by the "tp_dealloc" handler to clear weak references. This iterates through the weak references for *object* and calls callbacks for those references which have one. It returns when all callbacks have been attempted. void PyUnstable_Object_ClearWeakRefsNoCallbacks(PyObject *object) *This is Unstable API. It may change without warning in minor releases.* Clears the weakrefs for *object* without calling the callbacks. This function is called by the "tp_dealloc" handler for types with finalizers (i.e., "__del__()"). The handler for those objects first calls "PyObject_ClearWeakRefs()" to clear weakrefs and call their callbacks, then the finalizer, and finally this function to clear any weakrefs that may have been created by the finalizer. In most circumstances, it’s more appropriate to use "PyObject_ClearWeakRefs()" to clear weakrefs instead of this function. Added in version 3.13. Python Documentation contents ***************************** * What’s New in Python * What’s New In Python 3.13 * Summary – Release Highlights * New Features * A better interactive interpreter * Improved error messages * Free-threaded CPython * An experimental just-in-time (JIT) compiler * Defined mutation semantics for "locals()" * Support for mobile platforms * Other Language Changes * New Modules * Improved Modules * argparse * array * ast * asyncio * base64 * compileall * concurrent.futures * configparser * copy * ctypes * dbm * dis * doctest * email * enum * fractions * glob * importlib * io * ipaddress * itertools * marshal * math * mimetypes * mmap * multiprocessing * os * os.path * pathlib * pdb * queue * random * re * shutil * site * sqlite3 * ssl * statistics * subprocess * sys * tempfile * time * tkinter * traceback * types * typing * unicodedata * venv * warnings * xml * zipimport * Optimizations * Removed Modules And APIs * PEP 594: Remove “dead batteries” from the standard library * 2to3 * builtins * configparser * importlib.metadata * locale * opcode * optparse * pathlib * re * tkinter.tix * turtle * typing * unittest * urllib * webbrowser * New Deprecations * Pending Removal in Python 3.14 * Pending Removal in Python 3.15 * Pending removal in Python 3.16 * Pending Removal in Future Versions * CPython Bytecode Changes * C API Changes * New Features * Changed C APIs * Limited C API Changes * Removed C APIs * Deprecated C APIs * Pending Removal in Python 3.14 * Pending Removal in Python 3.15 * Pending Removal in Future Versions * Build Changes * Porting to Python 3.13 * Changes in the Python API * Changes in the C API * Regression Test Changes * Notable changes in 3.13.1 * sys * Notable changes in 3.13.4 * os.path * tarfile * What’s New In Python 3.12 * Summary – Release highlights * New Features * PEP 695: Type Parameter Syntax * PEP 701: Syntactic formalization of f-strings * PEP 684: A Per-Interpreter GIL * PEP 669: Low impact monitoring for CPython * PEP 688: Making the buffer protocol accessible in Python * PEP 709: Comprehension inlining * Improved Error Messages * New Features Related to Type Hints * PEP 692: Using "TypedDict" for more precise "**kwargs" typing * PEP 698: Override Decorator for Static Typing * Other Language Changes * New Modules * Improved Modules * array * asyncio * calendar * csv * dis * fractions * importlib.resources * inspect * itertools * math * os * os.path * pathlib * platform * pdb * random * shutil * sqlite3 * statistics * sys * tempfile * threading * tkinter * tokenize * types * typing * unicodedata * unittest * uuid * Optimizations * CPython bytecode changes * Demos and Tools * Deprecated * Pending Removal in Python 3.13 * Pending Removal in Python 3.14 * Pending Removal in Python 3.15 * Pending removal in Python 3.16 * Pending Removal in Future Versions * Removed * asynchat and asyncore * configparser * distutils * ensurepip * enum * ftplib * gzip * hashlib * importlib * imp * io * locale * smtpd * sqlite3 * ssl * unittest * webbrowser * xml.etree.ElementTree * zipimport * Others * Porting to Python 3.12 * Changes in the Python API * Build Changes * C API Changes * New Features * Porting to Python 3.12 * Deprecated * Pending Removal in Python 3.14 * Pending Removal in Python 3.15 * Pending Removal in Future Versions * Removed * What’s New In Python 3.11 * Summary – Release highlights * New Features * PEP 657: Fine-grained error locations in tracebacks * PEP 654: Exception Groups and "except*" * PEP 678: Exceptions can be enriched with notes * Windows "py.exe" launcher improvements * New Features Related to Type Hints * PEP 646: Variadic generics * PEP 655: Marking individual "TypedDict" items as required or not-required * PEP 673: "Self" type * PEP 675: Arbitrary literal string type * PEP 681: Data class transforms * PEP 563 may not be the future * Other Language Changes * Other CPython Implementation Changes * New Modules * Improved Modules * asyncio * contextlib * dataclasses * datetime * enum * fcntl * fractions * functools * gzip * hashlib * IDLE and idlelib * inspect * locale * logging * math * operator * os * pathlib * re * shutil * socket * sqlite3 * string * sys * sysconfig * tempfile * threading * time * tkinter * traceback * typing * unicodedata * unittest * venv * warnings * zipfile * Optimizations * Faster CPython * Faster Startup * Frozen imports / Static code objects * Faster Runtime * Cheaper, lazy Python frames * Inlined Python function calls * PEP 659: Specializing Adaptive Interpreter * Misc * FAQ * How should I write my code to utilize these speedups? * Will CPython 3.11 use more memory? * I don’t see any speedups in my workload. Why? * Is there a JIT compiler? * About * CPython bytecode changes * New opcodes * Replaced opcodes * Changed/removed opcodes * Deprecated * Language/Builtins * Modules * Standard Library * Pending Removal in Python 3.12 * Removed * Porting to Python 3.11 * Build Changes * C API Changes * New Features * Porting to Python 3.11 * Deprecated * Pending Removal in Python 3.12 * Removed * Notable changes in 3.11.4 * tarfile * Notable changes in 3.11.5 * OpenSSL * What’s New In Python 3.10 * Summary – Release highlights * New Features * Parenthesized context managers * Better error messages * SyntaxErrors * IndentationErrors * AttributeErrors * NameErrors * PEP 626: Precise line numbers for debugging and other tools * PEP 634: Structural Pattern Matching * Syntax and operations * Declarative approach * Simple pattern: match to a literal * Behavior without the wildcard * Patterns with a literal and variable * Patterns and classes * Patterns with positional parameters * Nested patterns * Complex patterns and the wildcard * Guard * Other Key Features * Optional "EncodingWarning" and "encoding="locale"" option * New Features Related to Type Hints * PEP 604: New Type Union Operator * PEP 612: Parameter Specification Variables * PEP 613: TypeAlias * PEP 647: User-Defined Type Guards * Other Language Changes * New Modules * Improved Modules * asyncio * argparse * array * asynchat, asyncore, smtpd * base64 * bdb * bisect * codecs * collections.abc * contextlib * curses * dataclasses * __slots__ * Keyword-only fields * distutils * doctest * encodings * enum * fileinput * faulthandler * gc * glob * hashlib * hmac * IDLE and idlelib * importlib.metadata * inspect * itertools * linecache * os * os.path * pathlib * platform * pprint * py_compile * pyclbr * shelve * statistics * site * socket * ssl * sqlite3 * sys * _thread * threading * traceback * types * typing * unittest * urllib.parse * xml * zipimport * Optimizations * Deprecated * Removed * Porting to Python 3.10 * Changes in the Python syntax * Changes in the Python API * Changes in the C API * CPython bytecode changes * Build Changes * C API Changes * PEP 652: Maintaining the Stable ABI * New Features * Porting to Python 3.10 * Deprecated * Removed * Notable security feature in 3.10.7 * Notable security feature in 3.10.8 * Notable changes in 3.10.12 * tarfile * What’s New In Python 3.9 * Summary – Release highlights * You should check for DeprecationWarning in your code * New Features * Dictionary Merge & Update Operators * New String Methods to Remove Prefixes and Suffixes * Type Hinting Generics in Standard Collections * New Parser * Other Language Changes * New Modules * zoneinfo * graphlib * Improved Modules * ast * asyncio * compileall * concurrent.futures * curses * datetime * distutils * fcntl * ftplib * gc * hashlib * http * IDLE and idlelib * imaplib * importlib * inspect * ipaddress * math * multiprocessing * nntplib * os * pathlib * pdb * poplib * pprint * pydoc * random * signal * smtplib * socket * time * sys * tracemalloc * typing * unicodedata * venv * xml * Optimizations * Deprecated * Removed * Porting to Python 3.9 * Changes in the Python API * Changes in the C API * CPython bytecode changes * Build Changes * C API Changes * New Features * Porting to Python 3.9 * Removed * Notable changes in Python 3.9.1 * typing * macOS 11.0 (Big Sur) and Apple Silicon Mac support * Notable changes in Python 3.9.2 * collections.abc * urllib.parse * Notable changes in Python 3.9.3 * Notable changes in Python 3.9.5 * urllib.parse * Notable security feature in 3.9.14 * Notable changes in 3.9.17 * tarfile * What’s New In Python 3.8 * Summary – Release highlights * New Features * Assignment expressions * Positional-only parameters * Parallel filesystem cache for compiled bytecode files * Debug build uses the same ABI as release build * f-strings support "=" for self-documenting expressions and debugging * PEP 578: Python Runtime Audit Hooks * PEP 587: Python Initialization Configuration * PEP 590: Vectorcall: a fast calling protocol for CPython * Pickle protocol 5 with out-of-band data buffers * Other Language Changes * New Modules * Improved Modules * ast * asyncio * builtins * collections * cProfile * csv * curses * ctypes * datetime * functools * gc * gettext * gzip * IDLE and idlelib * inspect * io * itertools * json.tool * logging * math * mmap * multiprocessing * os * os.path * pathlib * pickle * plistlib * pprint * py_compile * shlex * shutil * socket * ssl * statistics * sys * tarfile * threading * tokenize * tkinter * time * typing * unicodedata * unittest * venv * weakref * xml * xmlrpc * Optimizations * Build and C API Changes * Deprecated * API and Feature Removals * Porting to Python 3.8 * Changes in Python behavior * Changes in the Python API * Changes in the C API * CPython bytecode changes * Demos and Tools * Notable changes in Python 3.8.1 * Notable changes in Python 3.8.2 * Notable changes in Python 3.8.3 * Notable changes in Python 3.8.8 * Notable changes in Python 3.8.9 * Notable changes in Python 3.8.10 * macOS 11.0 (Big Sur) and Apple Silicon Mac support * Notable changes in Python 3.8.10 * urllib.parse * Notable changes in Python 3.8.12 * Changes in the Python API * Notable security feature in 3.8.14 * Notable changes in 3.8.17 * tarfile * What’s New In Python 3.7 * Summary – Release Highlights * New Features * PEP 563: Postponed Evaluation of Annotations * PEP 538: Legacy C Locale Coercion * PEP 540: Forced UTF-8 Runtime Mode * PEP 553: Built-in "breakpoint()" * PEP 539: New C API for Thread-Local Storage * PEP 562: Customization of Access to Module Attributes * PEP 564: New Time Functions With Nanosecond Resolution * PEP 565: Show DeprecationWarning in "__main__" * PEP 560: Core Support for "typing" module and Generic Types * PEP 552: Hash-based .pyc Files * PEP 545: Python Documentation Translations * Python Development Mode (-X dev) * Other Language Changes * New Modules * contextvars * dataclasses * importlib.resources * Improved Modules * argparse * asyncio * binascii * calendar * collections * compileall * concurrent.futures * contextlib * cProfile * crypt * datetime * dbm * decimal * dis * distutils * enum * functools * gc * hmac * http.client * http.server * idlelib and IDLE * importlib * io * ipaddress * itertools * locale * logging * math * mimetypes * msilib * multiprocessing * os * pathlib * pdb * py_compile * pydoc * queue * re * signal * socket * socketserver * sqlite3 * ssl * string * subprocess * sys * time * tkinter * tracemalloc * types * unicodedata * unittest * unittest.mock * urllib.parse * uu * uuid * warnings * xml * xml.etree * xmlrpc.server * zipapp * zipfile * C API Changes * Build Changes * Optimizations * Other CPython Implementation Changes * Deprecated Python Behavior * Deprecated Python modules, functions and methods * aifc * asyncio * collections * dbm * enum * gettext * importlib * locale * macpath * threading * socket * ssl * sunau * sys * wave * Deprecated functions and types of the C API * Platform Support Removals * API and Feature Removals * Module Removals * Windows-only Changes * Porting to Python 3.7 * Changes in Python Behavior * Changes in the Python API * Changes in the C API * CPython bytecode changes * Windows-only Changes * Other CPython implementation changes * Notable changes in Python 3.7.1 * Notable changes in Python 3.7.2 * Notable changes in Python 3.7.6 * Notable changes in Python 3.7.10 * Notable changes in Python 3.7.11 * Notable security feature in 3.7.14 * What’s New In Python 3.6 * Summary – Release highlights * New Features * PEP 498: Formatted string literals * PEP 526: Syntax for variable annotations * PEP 515: Underscores in Numeric Literals * PEP 525: Asynchronous Generators * PEP 530: Asynchronous Comprehensions * PEP 487: Simpler customization of class creation * PEP 487: Descriptor Protocol Enhancements * PEP 519: Adding a file system path protocol * PEP 495: Local Time Disambiguation * PEP 529: Change Windows filesystem encoding to UTF-8 * PEP 528: Change Windows console encoding to UTF-8 * PEP 520: Preserving Class Attribute Definition Order * PEP 468: Preserving Keyword Argument Order * New *dict* implementation * PEP 523: Adding a frame evaluation API to CPython * PYTHONMALLOC environment variable * DTrace and SystemTap probing support * Other Language Changes * New Modules * secrets * Improved Modules * array * ast * asyncio * binascii * cmath * collections * concurrent.futures * contextlib * datetime * decimal * distutils * email * encodings * enum * faulthandler * fileinput * hashlib * http.client * idlelib and IDLE * importlib * inspect * json * logging * math * multiprocessing * os * pathlib * pdb * pickle * pickletools * pydoc * random * re * readline * rlcompleter * shlex * site * sqlite3 * socket * socketserver * ssl * statistics * struct * subprocess * sys * telnetlib * time * timeit * tkinter * traceback * tracemalloc * typing * unicodedata * unittest.mock * urllib.request * urllib.robotparser * venv * warnings * winreg * winsound * xmlrpc.client * zipfile * zlib * Optimizations * Build and C API Changes * Other Improvements * Deprecated * New Keywords * Deprecated Python behavior * Deprecated Python modules, functions and methods * asynchat * asyncore * dbm * distutils * grp * importlib * os * re * ssl * tkinter * venv * xml * Deprecated functions and types of the C API * Deprecated Build Options * Removed * API and Feature Removals * Porting to Python 3.6 * Changes in ‘python’ Command Behavior * Changes in the Python API * Changes in the C API * CPython bytecode changes * Notable changes in Python 3.6.2 * New "make regen-all" build target * Removal of "make touch" build target * Notable changes in Python 3.6.4 * Notable changes in Python 3.6.5 * Notable changes in Python 3.6.7 * Notable changes in Python 3.6.10 * Notable changes in Python 3.6.13 * Notable changes in Python 3.6.14 * What’s New In Python 3.5 * Summary – Release highlights * New Features * PEP 492 - Coroutines with async and await syntax * PEP 465 - A dedicated infix operator for matrix multiplication * PEP 448 - Additional Unpacking Generalizations * PEP 461 - percent formatting support for bytes and bytearray * PEP 484 - Type Hints * PEP 471 - os.scandir() function – a better and faster directory iterator * PEP 475: Retry system calls failing with EINTR * PEP 479: Change StopIteration handling inside generators * PEP 485: A function for testing approximate equality * PEP 486: Make the Python Launcher aware of virtual environments * PEP 488: Elimination of PYO files * PEP 489: Multi-phase extension module initialization * Other Language Changes * New Modules * typing * zipapp * Improved Modules * argparse * asyncio * bz2 * cgi * cmath * code * collections * collections.abc * compileall * concurrent.futures * configparser * contextlib * csv * curses * dbm * difflib * distutils * doctest * email * enum * faulthandler * functools * glob * gzip * heapq * http * http.client * idlelib and IDLE * imaplib * imghdr * importlib * inspect * io * ipaddress * json * linecache * locale * logging * lzma * math * multiprocessing * operator * os * pathlib * pickle * poplib * re * readline * selectors * shutil * signal * smtpd * smtplib * sndhdr * socket * ssl * Memory BIO Support * Application-Layer Protocol Negotiation Support * Other Changes * sqlite3 * subprocess * sys * sysconfig * tarfile * threading * time * timeit * tkinter * traceback * types * unicodedata * unittest * unittest.mock * urllib * wsgiref * xmlrpc * xml.sax * zipfile * Other module-level changes * Optimizations * Build and C API Changes * Deprecated * New Keywords * Deprecated Python Behavior * Unsupported Operating Systems * Deprecated Python modules, functions and methods * Removed * API and Feature Removals * Porting to Python 3.5 * Changes in Python behavior * Changes in the Python API * Changes in the C API * Notable changes in Python 3.5.4 * New "make regen-all" build target * Removal of "make touch" build target * What’s New In Python 3.4 * Summary – Release Highlights * New Features * PEP 453: Explicit Bootstrapping of PIP in Python Installations * Bootstrapping pip By Default * Documentation Changes * PEP 446: Newly Created File Descriptors Are Non-Inheritable * Improvements to Codec Handling * PEP 451: A ModuleSpec Type for the Import System * Other Language Changes * New Modules * asyncio * ensurepip * enum * pathlib * selectors * statistics * tracemalloc * Improved Modules * abc * aifc * argparse * audioop * base64 * collections * colorsys * contextlib * dbm * dis * doctest * email * filecmp * functools * gc * glob * hashlib * hmac * html * http * idlelib and IDLE * importlib * inspect * ipaddress * logging * marshal * mmap * multiprocessing * operator * os * pdb * pickle * plistlib * poplib * pprint * pty * pydoc * re * resource * select * shelve * shutil * smtpd * smtplib * socket * sqlite3 * ssl * stat * struct * subprocess * sunau * sys * tarfile * textwrap * threading * traceback * types * urllib * unittest * venv * wave * weakref * xml.etree * zipfile * CPython Implementation Changes * PEP 445: Customization of CPython Memory Allocators * PEP 442: Safe Object Finalization * PEP 456: Secure and Interchangeable Hash Algorithm * PEP 436: Argument Clinic * Other Build and C API Changes * Other Improvements * Significant Optimizations * Deprecated * Deprecations in the Python API * Deprecated Features * Removed * Operating Systems No Longer Supported * API and Feature Removals * Code Cleanups * Porting to Python 3.4 * Changes in ‘python’ Command Behavior * Changes in the Python API * Changes in the C API * Changed in 3.4.3 * PEP 476: Enabling certificate verification by default for stdlib http clients * What’s New In Python 3.3 * Summary – Release highlights * PEP 405: Virtual Environments * PEP 420: Implicit Namespace Packages * PEP 3118: New memoryview implementation and buffer protocol documentation * Features * API changes * PEP 393: Flexible String Representation * Functionality * Performance and resource usage * PEP 397: Python Launcher for Windows * PEP 3151: Reworking the OS and IO exception hierarchy * PEP 380: Syntax for Delegating to a Subgenerator * PEP 409: Suppressing exception context * PEP 414: Explicit Unicode literals * PEP 3155: Qualified name for classes and functions * PEP 412: Key-Sharing Dictionary * PEP 362: Function Signature Object * PEP 421: Adding sys.implementation * SimpleNamespace * Using importlib as the Implementation of Import * New APIs * Visible Changes * Other Language Changes * A Finer-Grained Import Lock * Builtin functions and types * New Modules * faulthandler * ipaddress * lzma * Improved Modules * abc * array * base64 * binascii * bz2 * codecs * collections * contextlib * crypt * curses * datetime * decimal * Features * API changes * email * Policy Framework * Provisional Policy with New Header API * Other API Changes * ftplib * functools * gc * hmac * http * html * imaplib * inspect * io * itertools * logging * math * mmap * multiprocessing * nntplib * os * pdb * pickle * pydoc * re * sched * select * shlex * shutil * signal * smtpd * smtplib * socket * socketserver * sqlite3 * ssl * stat * struct * subprocess * sys * tarfile * tempfile * textwrap * threading * time * types * unittest * urllib * webbrowser * xml.etree.ElementTree * zlib * Optimizations * Build and C API Changes * Deprecated * Unsupported Operating Systems * Deprecated Python modules, functions and methods * Deprecated functions and types of the C API * Deprecated features * Porting to Python 3.3 * Porting Python code * Porting C code * Building C extensions * Command Line Switch Changes * What’s New In Python 3.2 * PEP 384: Defining a Stable ABI * PEP 389: Argparse Command Line Parsing Module * PEP 391: Dictionary Based Configuration for Logging * PEP 3148: The "concurrent.futures" module * PEP 3147: PYC Repository Directories * PEP 3149: ABI Version Tagged .so Files * PEP 3333: Python Web Server Gateway Interface v1.0.1 * Other Language Changes * New, Improved, and Deprecated Modules * email * elementtree * functools * itertools * collections * threading * datetime and time * math * abc * io * reprlib * logging * csv * contextlib * decimal and fractions * ftp * popen * select * gzip and zipfile * tarfile * hashlib * ast * os * shutil * sqlite3 * html * socket * ssl * nntp * certificates * imaplib * http.client * unittest * random * poplib * asyncore * tempfile * inspect * pydoc * dis * dbm * ctypes * site * sysconfig * pdb * configparser * urllib.parse * mailbox * turtledemo * Multi-threading * Optimizations * Unicode * Codecs * Documentation * IDLE * Code Repository * Build and C API Changes * Porting to Python 3.2 * What’s New In Python 3.1 * PEP 372: Ordered Dictionaries * PEP 378: Format Specifier for Thousands Separator * Other Language Changes * New, Improved, and Deprecated Modules * Optimizations * IDLE * Build and C API Changes * Porting to Python 3.1 * What’s New In Python 3.0 * Common Stumbling Blocks * Print Is A Function * Views And Iterators Instead Of Lists * Ordering Comparisons * Integers * Text Vs. Data Instead Of Unicode Vs. 8-bit * Overview Of Syntax Changes * New Syntax * Changed Syntax * Removed Syntax * Changes Already Present In Python 2.6 * Library Changes * **PEP 3101**: A New Approach To String Formatting * Changes To Exceptions * Miscellaneous Other Changes * Operators And Special Methods * Builtins * Build and C API Changes * Performance * Porting To Python 3.0 * What’s New in Python 2.7 * The Future for Python 2.x * Changes to the Handling of Deprecation Warnings * Python 3.1 Features * PEP 372: Adding an Ordered Dictionary to collections * PEP 378: Format Specifier for Thousands Separator * PEP 389: The argparse Module for Parsing Command Lines * PEP 391: Dictionary-Based Configuration For Logging * PEP 3106: Dictionary Views * PEP 3137: The memoryview Object * Other Language Changes * Interpreter Changes * Optimizations * New and Improved Modules * New module: importlib * New module: sysconfig * ttk: Themed Widgets for Tk * Updated module: unittest * Updated module: ElementTree 1.3 * Build and C API Changes * Capsules * Port-Specific Changes: Windows * Port-Specific Changes: Mac OS X * Port-Specific Changes: FreeBSD * Other Changes and Fixes * Porting to Python 2.7 * New Features Added to Python 2.7 Maintenance Releases * Two new environment variables for debug mode * PEP 434: IDLE Enhancement Exception for All Branches * PEP 466: Network Security Enhancements for Python 2.7 * PEP 477: Backport ensurepip (PEP 453) to Python 2.7 * Bootstrapping pip By Default * Documentation Changes * PEP 476: Enabling certificate verification by default for stdlib http clients * PEP 493: HTTPS verification migration tools for Python 2.7 * New "make regen-all" build target * Removal of "make touch" build target * Acknowledgements * What’s New in Python 2.6 * Python 3.0 * Changes to the Development Process * New Issue Tracker: Roundup * New Documentation Format: reStructuredText Using Sphinx * PEP 343: The ‘with’ statement * Writing Context Managers * The contextlib module * PEP 366: Explicit Relative Imports From a Main Module * PEP 370: Per-user "site-packages" Directory * PEP 371: The "multiprocessing" Package * PEP 3101: Advanced String Formatting * PEP 3105: "print" As a Function * PEP 3110: Exception-Handling Changes * PEP 3112: Byte Literals * PEP 3116: New I/O Library * PEP 3118: Revised Buffer Protocol * PEP 3119: Abstract Base Classes * PEP 3127: Integer Literal Support and Syntax * PEP 3129: Class Decorators * PEP 3141: A Type Hierarchy for Numbers * The "fractions" Module * Other Language Changes * Optimizations * Interpreter Changes * New and Improved Modules * The "ast" module * The "future_builtins" module * The "json" module: JavaScript Object Notation * The "plistlib" module: A Property-List Parser * ctypes Enhancements * Improved SSL Support * Deprecations and Removals * Build and C API Changes * Port-Specific Changes: Windows * Port-Specific Changes: Mac OS X * Port-Specific Changes: IRIX * Porting to Python 2.6 * Acknowledgements * What’s New in Python 2.5 * PEP 308: Conditional Expressions * PEP 309: Partial Function Application * PEP 314: Metadata for Python Software Packages v1.1 * PEP 328: Absolute and Relative Imports * PEP 338: Executing Modules as Scripts * PEP 341: Unified try/except/finally * PEP 342: New Generator Features * PEP 343: The ‘with’ statement * Writing Context Managers * The contextlib module * PEP 352: Exceptions as New-Style Classes * PEP 353: Using ssize_t as the index type * PEP 357: The ‘__index__’ method * Other Language Changes * Interactive Interpreter Changes * Optimizations * New, Improved, and Removed Modules * The ctypes package * The ElementTree package * The hashlib package * The sqlite3 package * The wsgiref package * Build and C API Changes * Port-Specific Changes * Porting to Python 2.5 * Acknowledgements * What’s New in Python 2.4 * PEP 218: Built-In Set Objects * PEP 237: Unifying Long Integers and Integers * PEP 289: Generator Expressions * PEP 292: Simpler String Substitutions * PEP 318: Decorators for Functions and Methods * PEP 322: Reverse Iteration * PEP 324: New subprocess Module * PEP 327: Decimal Data Type * Why is Decimal needed? * The "Decimal" type * The "Context" type * PEP 328: Multi-line Imports * PEP 331: Locale-Independent Float/String Conversions * Other Language Changes * Optimizations * New, Improved, and Deprecated Modules * cookielib * doctest * Build and C API Changes * Port-Specific Changes * Porting to Python 2.4 * Acknowledgements * What’s New in Python 2.3 * PEP 218: A Standard Set Datatype * PEP 255: Simple Generators * PEP 263: Source Code Encodings * PEP 273: Importing Modules from ZIP Archives * PEP 277: Unicode file name support for Windows NT * PEP 278: Universal Newline Support * PEP 279: enumerate() * PEP 282: The logging Package * PEP 285: A Boolean Type * PEP 293: Codec Error Handling Callbacks * PEP 301: Package Index and Metadata for Distutils * PEP 302: New Import Hooks * PEP 305: Comma-separated Files * PEP 307: Pickle Enhancements * Extended Slices * Other Language Changes * String Changes * Optimizations * New, Improved, and Deprecated Modules * Date/Time Type * The optparse Module * Pymalloc: A Specialized Object Allocator * Build and C API Changes * Port-Specific Changes * Other Changes and Fixes * Porting to Python 2.3 * Acknowledgements * What’s New in Python 2.2 * Introduction * PEPs 252 and 253: Type and Class Changes * Old and New Classes * Descriptors * Multiple Inheritance: The Diamond Rule * Attribute Access * Related Links * PEP 234: Iterators * PEP 255: Simple Generators * PEP 237: Unifying Long Integers and Integers * PEP 238: Changing the Division Operator * Unicode Changes * PEP 227: Nested Scopes * New and Improved Modules * Interpreter Changes and Fixes * Other Changes and Fixes * Acknowledgements * What’s New in Python 2.1 * Introduction * PEP 227: Nested Scopes * PEP 236: __future__ Directives * PEP 207: Rich Comparisons * PEP 230: Warning Framework * PEP 229: New Build System * PEP 205: Weak References * PEP 232: Function Attributes * PEP 235: Importing Modules on Case-Insensitive Platforms * PEP 217: Interactive Display Hook * PEP 208: New Coercion Model * PEP 241: Metadata in Python Packages * New and Improved Modules * Other Changes and Fixes * Acknowledgements * What’s New in Python 2.0 * Introduction * What About Python 1.6? * New Development Process * Unicode * List Comprehensions * Augmented Assignment * String Methods * Garbage Collection of Cycles * Other Core Changes * Minor Language Changes * Changes to Built-in Functions * Porting to 2.0 * Extending/Embedding Changes * Distutils: Making Modules Easy to Install * XML Modules * SAX2 Support * DOM Support * Relationship to PyXML * Module changes * New modules * IDLE Improvements * Deleted and Deprecated Modules * Acknowledgements * Changelog * Python 3.13.5 final * Windows * Tests * Library * Core and Builtins * C API * Python 3.13.4 final * Windows * Tests * Security * Library * IDLE * Documentation * Core and Builtins * C API * Build * Python 3.13.3 final * macOS * Windows * Tools/Demos * Tests * Security * Library * IDLE * Documentation * Core and Builtins * C API * Build * Python 3.13.2 final * macOS * Windows * Tools/Demos * Tests * Security * Library * Documentation * Core and Builtins * C API * Build * Python 3.13.1 final * macOS * Windows * Tools/Demos * Tests * Security * Library * IDLE * Documentation * Core and Builtins * C API * Build * Python 3.13.0 final * Core and Builtins * Python 3.13.0 release candidate 3 * macOS * Windows * Tests * Library * IDLE * Documentation * Core and Builtins * C API * Build * Python 3.13.0 release candidate 2 * macOS * Windows * Tools/Demos * Tests * Security * Library * IDLE * Core and Builtins * C API * Build * Python 3.13.0 release candidate 1 * Tests * Security * Library * IDLE * Core and Builtins * C API * Build * Python 3.13.0 beta 4 * Tests * Library * IDLE * Documentation * Core and Builtins * C API * Build * Python 3.13.0 beta 3 * Core and Builtins * Library * Build * C API * Python 3.13.0 beta 2 * Security * Core and Builtins * Library * Tests * Build * Windows * C API * Python 3.13.0 beta 1 * Security * Core and Builtins * Library * Documentation * Build * Windows * macOS * IDLE * C API * Python 3.13.0 alpha 6 * Core and Builtins * Library * Documentation * Tests * Build * Windows * C API * Python 3.13.0 alpha 5 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.13.0 alpha 4 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.13.0 alpha 3 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.13.0 alpha 2 * Core and Builtins * Library * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.13.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.12.0 beta 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.12.0 alpha 7 * Core and Builtins * Library * Documentation * Tests * Build * Windows * Tools/Demos * C API * Python 3.12.0 alpha 6 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * C API * Python 3.12.0 alpha 5 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * Python 3.12.0 alpha 4 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Tools/Demos * C API * Python 3.12.0 alpha 3 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Tools/Demos * C API * Python 3.12.0 alpha 2 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * C API * Python 3.12.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.11.0 beta 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Tools/Demos * C API * Python 3.11.0 alpha 7 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Tools/Demos * C API * Python 3.11.0 alpha 6 * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * C API * Python 3.11.0 alpha 5 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.11.0 alpha 4 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * C API * Python 3.11.0 alpha 3 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * C API * Python 3.11.0 alpha 2 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.11.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.10.0 beta 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.10.0 alpha 7 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * C API * Python 3.10.0 alpha 6 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.10.0 alpha 5 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.10.0 alpha 4 * Core and Builtins * Library * Documentation * Tests * Build * macOS * Tools/Demos * C API * Python 3.10.0 alpha 3 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.10.0 alpha 2 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.10.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.9.0 beta 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Tools/Demos * C API * Python 3.9.0 alpha 6 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.9.0 alpha 5 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.9.0 alpha 4 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * C API * Python 3.9.0 alpha 3 * Core and Builtins * Library * Documentation * Build * IDLE * C API * Python 3.9.0 alpha 2 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.9.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.8.0 beta 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.8.0 alpha 4 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.8.0 alpha 3 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * Tools/Demos * C API * Python 3.8.0 alpha 2 * Core and Builtins * Library * Documentation * Tests * Windows * IDLE * Python 3.8.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.7.0 final * Library * C API * Python 3.7.0 release candidate 1 * Core and Builtins * Library * Documentation * Build * Windows * IDLE * Python 3.7.0 beta 5 * Core and Builtins * Library * Documentation * Tests * Build * macOS * IDLE * Python 3.7.0 beta 4 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * Python 3.7.0 beta 3 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.7.0 beta 2 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * Python 3.7.0 beta 1 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * C API * Python 3.7.0 alpha 4 * Core and Builtins * Library * Documentation * Tests * Windows * Tools/Demos * C API * Python 3.7.0 alpha 3 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.7.0 alpha 2 * Core and Builtins * Library * Documentation * Build * IDLE * C API * Python 3.7.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * Tools/Demos * C API * Python 3.6.6 final * Python 3.6.6 release candidate 1 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.6.5 final * Tests * Build * Python 3.6.5 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.6.4 final * Python 3.6.4 release candidate 1 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.6.3 final * Library * Build * Python 3.6.3 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * Tools/Demos * Python 3.6.2 final * Python 3.6.2 release candidate 2 * Security * Python 3.6.2 release candidate 1 * Security * Core and Builtins * Library * IDLE * C API * Build * Documentation * Tools/Demos * Tests * Windows * Python 3.6.1 final * Core and Builtins * Build * Python 3.6.1 release candidate 1 * Core and Builtins * Library * IDLE * Windows * C API * Documentation * Tests * Build * Python 3.6.0 final * Python 3.6.0 release candidate 2 * Core and Builtins * Tools/Demos * Windows * Build * Python 3.6.0 release candidate 1 * Core and Builtins * Library * C API * Documentation * Tools/Demos * Python 3.6.0 beta 4 * Core and Builtins * Library * Documentation * Tests * Build * Python 3.6.0 beta 3 * Core and Builtins * Library * Windows * Build * Tests * Python 3.6.0 beta 2 * Core and Builtins * Library * Windows * C API * Build * Tests * Python 3.6.0 beta 1 * Core and Builtins * Library * IDLE * C API * Tests * Build * Tools/Demos * Windows * Python 3.6.0 alpha 4 * Core and Builtins * Library * IDLE * Tests * Windows * Build * Python 3.6.0 alpha 3 * Security * Core and Builtins * Library * IDLE * C API * Build * Tools/Demos * Documentation * Tests * Python 3.6.0 alpha 2 * Security * Core and Builtins * Library * IDLE * Documentation * Tests * Windows * Build * C API * Tools/Demos * Python 3.6.0 alpha 1 * Security * Core and Builtins * Library * IDLE * Documentation * Tests * Build * Windows * Tools/Demos * C API * Python 3.5.5 final * Python 3.5.5 release candidate 1 * Security * Core and Builtins * Library * Python 3.5.4 final * Library * Python 3.5.4 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * C API * Python 3.5.3 final * Python 3.5.3 release candidate 1 * Security * Core and Builtins * Library * IDLE * C API * Documentation * Tests * Tools/Demos * Windows * Build * Python 3.5.2 final * Core and Builtins * Tests * IDLE * Python 3.5.2 release candidate 1 * Security * Core and Builtins * Library * IDLE * Documentation * Tests * Build * Windows * Tools/Demos * Python 3.5.1 final * Core and Builtins * Windows * Python 3.5.1 release candidate 1 * Core and Builtins * Library * IDLE * Documentation * Tests * Build * Windows * Tools/Demos * Python 3.5.0 final * Build * Python 3.5.0 release candidate 4 * Library * Build * Python 3.5.0 release candidate 3 * Core and Builtins * Library * Python 3.5.0 release candidate 2 * Core and Builtins * Library * Python 3.5.0 release candidate 1 * Core and Builtins * Library * IDLE * Documentation * Tests * Python 3.5.0 beta 4 * Core and Builtins * Library * Build * Python 3.5.0 beta 3 * Core and Builtins * Library * Tests * Documentation * Build * Python 3.5.0 beta 2 * Core and Builtins * Library * Python 3.5.0 beta 1 * Core and Builtins * Library * IDLE * Tests * Documentation * Tools/Demos * Python 3.5.0 alpha 4 * Core and Builtins * Library * Build * Tests * Tools/Demos * C API * Python 3.5.0 alpha 3 * Core and Builtins * Library * Build * Tests * Tools/Demos * Python 3.5.0 alpha 2 * Core and Builtins * Library * Build * C API * Windows * Python 3.5.0 alpha 1 * Core and Builtins * Library * IDLE * Build * C API * Documentation * Tests * Tools/Demos * Windows * The Python Tutorial * 1. Whetting Your Appetite * 2. Using the Python Interpreter * 2.1. Invoking the Interpreter * 2.1.1. Argument Passing * 2.1.2. Interactive Mode * 2.2. The Interpreter and Its Environment * 2.2.1. Source Code Encoding * 3. An Informal Introduction to Python * 3.1. Using Python as a Calculator * 3.1.1. Numbers * 3.1.2. Text * 3.1.3. Lists * 3.2. First Steps Towards Programming * 4. More Control Flow Tools * 4.1. "if" Statements * 4.2. "for" Statements * 4.3. The "range()" Function * 4.4. "break" and "continue" Statements * 4.5. "else" Clauses on Loops * 4.6. "pass" Statements * 4.7. "match" Statements * 4.8. Defining Functions * 4.9. More on Defining Functions * 4.9.1. Default Argument Values * 4.9.2. Keyword Arguments * 4.9.3. Special parameters * 4.9.3.1. Positional-or-Keyword Arguments * 4.9.3.2. Positional-Only Parameters * 4.9.3.3. Keyword-Only Arguments * 4.9.3.4. Function Examples * 4.9.3.5. Recap * 4.9.4. Arbitrary Argument Lists * 4.9.5. Unpacking Argument Lists * 4.9.6. Lambda Expressions * 4.9.7. Documentation Strings * 4.9.8. Function Annotations * 4.10. Intermezzo: Coding Style * 5. Data Structures * 5.1. More on Lists * 5.1.1. Using Lists as Stacks * 5.1.2. Using Lists as Queues * 5.1.3. List Comprehensions * 5.1.4. Nested List Comprehensions * 5.2. The "del" statement * 5.3. Tuples and Sequences * 5.4. Sets * 5.5. Dictionaries * 5.6. Looping Techniques * 5.7. More on Conditions * 5.8. Comparing Sequences and Other Types * 6. Modules * 6.1. More on Modules * 6.1.1. Executing modules as scripts * 6.1.2. The Module Search Path * 6.1.3. “Compiled” Python files * 6.2. Standard Modules * 6.3. The "dir()" Function * 6.4. Packages * 6.4.1. Importing * From a Package * 6.4.2. Intra-package References * 6.4.3. Packages in Multiple Directories * 7. Input and Output * 7.1. Fancier Output Formatting * 7.1.1. Formatted String Literals * 7.1.2. The String format() Method * 7.1.3. Manual String Formatting * 7.1.4. Old string formatting * 7.2. Reading and Writing Files * 7.2.1. Methods of File Objects * 7.2.2. Saving structured data with "json" * 8. Errors and Exceptions * 8.1. Syntax Errors * 8.2. Exceptions * 8.3. Handling Exceptions * 8.4. Raising Exceptions * 8.5. Exception Chaining * 8.6. User-defined Exceptions * 8.7. Defining Clean-up Actions * 8.8. Predefined Clean-up Actions * 8.9. Raising and Handling Multiple Unrelated Exceptions * 8.10. Enriching Exceptions with Notes * 9. Classes * 9.1. A Word About Names and Objects * 9.2. Python Scopes and Namespaces * 9.2.1. Scopes and Namespaces Example * 9.3. A First Look at Classes * 9.3.1. Class Definition Syntax * 9.3.2. Class Objects * 9.3.3. Instance Objects * 9.3.4. Method Objects * 9.3.5. Class and Instance Variables * 9.4. Random Remarks * 9.5. Inheritance * 9.5.1. Multiple Inheritance * 9.6. Private Variables * 9.7. Odds and Ends * 9.8. Iterators * 9.9. Generators * 9.10. Generator Expressions * 10. Brief Tour of the Standard Library * 10.1. Operating System Interface * 10.2. File Wildcards * 10.3. Command Line Arguments * 10.4. Error Output Redirection and Program Termination * 10.5. String Pattern Matching * 10.6. Mathematics * 10.7. Internet Access * 10.8. Dates and Times * 10.9. Data Compression * 10.10. Performance Measurement * 10.11. Quality Control * 10.12. Batteries Included * 11. Brief Tour of the Standard Library — Part II * 11.1. Output Formatting * 11.2. Templating * 11.3. Working with Binary Data Record Layouts * 11.4. Multi-threading * 11.5. Logging * 11.6. Weak References * 11.7. Tools for Working with Lists * 11.8. Decimal Floating-Point Arithmetic * 12. Virtual Environments and Packages * 12.1. Introduction * 12.2. Creating Virtual Environments * 12.3. Managing Packages with pip * 13. What Now? * 14. Interactive Input Editing and History Substitution * 14.1. Tab Completion and History Editing * 14.2. Alternatives to the Interactive Interpreter * 15. Floating-Point Arithmetic: Issues and Limitations * 15.1. Representation Error * 16. Appendix * 16.1. Interactive Mode * 16.1.1. Error Handling * 16.1.2. Executable Python Scripts * 16.1.3. The Interactive Startup File * 16.1.4. The Customization Modules * Python Setup and Usage * 1. Command line and environment * 1.1. Command line * 1.1.1. Interface options * 1.1.2. Generic options * 1.1.3. Miscellaneous options * 1.1.4. Controlling color * 1.1.5. Options you shouldn’t use * 1.2. Environment variables * 1.2.1. Debug-mode variables * 2. Using Python on Unix platforms * 2.1. Getting and installing the latest version of Python * 2.1.1. On Linux * 2.1.1.1. Installing IDLE * 2.1.2. On FreeBSD and OpenBSD * 2.2. Building Python * 2.3. Python-related paths and files * 2.4. Miscellaneous * 2.5. Custom OpenSSL * 3. Configure Python * 3.1. Build Requirements * 3.2. Generated files * 3.2.1. configure script * 3.3. Configure Options * 3.3.1. General Options * 3.3.2. C compiler options * 3.3.3. Linker options * 3.3.4. Options for third-party dependencies * 3.3.5. WebAssembly Options * 3.3.6. Install Options * 3.3.7. Performance options * 3.3.8. Python Debug Build * 3.3.9. Debug options * 3.3.10. Linker options * 3.3.11. Libraries options * 3.3.12. Security Options * 3.3.13. macOS Options * 3.3.14. iOS Options * 3.3.15. Cross Compiling Options * 3.4. Python Build System * 3.4.1. Main files of the build system * 3.4.2. Main build steps * 3.4.3. Main Makefile targets * 3.4.3.1. make * 3.4.3.2. make platform * 3.4.3.3. make profile-opt * 3.4.3.4. make clean * 3.4.3.5. make distclean * 3.4.3.6. make install * 3.4.3.7. make test * 3.4.3.8. make buildbottest * 3.4.3.9. make regen-all * 3.4.4. C extensions * 3.5. Compiler and linker flags * 3.5.1. Preprocessor flags * 3.5.2. Compiler flags * 3.5.3. Linker flags * 4. Using Python on Windows * 4.1. The full installer * 4.1.1. Installation steps * 4.1.2. Removing the MAX_PATH Limitation * 4.1.3. Installing Without UI * 4.1.4. Installing Without Downloading * 4.1.5. Modifying an install * 4.1.6. Installing Free-threaded Binaries * 4.2. The Microsoft Store package * 4.2.1. Known issues * 4.2.1.1. Redirection of local data, registry, and temporary paths * 4.3. The nuget.org packages * 4.3.1. Free-threaded packages * 4.4. The embeddable package * 4.4.1. Python Application * 4.4.2. Embedding Python * 4.5. Alternative bundles * 4.6. Configuring Python * 4.6.1. Excursus: Setting environment variables * 4.6.2. Finding the Python executable * 4.7. UTF-8 mode * 4.8. Python Launcher for Windows * 4.8.1. Getting started * 4.8.1.1. From the command-line * 4.8.1.2. Virtual environments * 4.8.1.3. From a script * 4.8.1.4. From file associations * 4.8.2. Shebang Lines * 4.8.3. Arguments in shebang lines * 4.8.4. Customization * 4.8.4.1. Customization via INI files * 4.8.4.2. Customizing default Python versions * 4.8.5. Diagnostics * 4.8.6. Dry Run * 4.8.7. Install on demand * 4.8.8. Return codes * 4.9. Finding modules * 4.10. Additional modules * 4.10.1. PyWin32 * 4.10.2. cx_Freeze * 4.11. Compiling Python on Windows * 4.12. Other Platforms * 5. Using Python on macOS * 5.1. Using Python for macOS from "python.org" * 5.1.1. Installation steps * 5.1.2. How to run a Python script * 5.2. Alternative Distributions * 5.3. Installing Additional Python Packages * 5.4. GUI Programming * 5.5. Advanced Topics * 5.5.1. Installing Free-threaded Binaries * 5.5.2. Installing using the command line * 5.5.3. Distributing Python Applications * 5.5.4. App Store Compliance * 5.6. Other Resources * 6. Using Python on Android * 6.1. Adding Python to an Android app * 6.2. Building a Python package for Android * 7. Using Python on iOS * 7.1. Python at runtime on iOS * 7.1.1. iOS version compatibility * 7.1.2. Platform identification * 7.1.3. Standard library availability * 7.1.4. Binary extension modules * 7.1.5. Compiler stub binaries * 7.2. Installing Python on iOS * 7.2.1. Tools for building iOS apps * 7.2.2. Adding Python to an iOS project * 7.2.3. Testing a Python package * 7.3. App Store Compliance * 8. Editors and IDEs * 8.1. IDLE — Python editor and shell * 8.2. Other Editors and IDEs * The Python Language Reference * 1. Introduction * 1.1. Alternate Implementations * 1.2. Notation * 2. Lexical analysis * 2.1. Line structure * 2.1.1. Logical lines * 2.1.2. Physical lines * 2.1.3. Comments * 2.1.4. Encoding declarations * 2.1.5. Explicit line joining * 2.1.6. Implicit line joining * 2.1.7. Blank lines * 2.1.8. Indentation * 2.1.9. Whitespace between tokens * 2.2. Other tokens * 2.3. Identifiers and keywords * 2.3.1. Keywords * 2.3.2. Soft Keywords * 2.3.3. Reserved classes of identifiers * 2.4. Literals * 2.4.1. String and Bytes literals * 2.4.1.1. Escape sequences * 2.4.2. String literal concatenation * 2.4.3. f-strings * 2.4.4. Numeric literals * 2.4.5. Integer literals * 2.4.6. Floating-point literals * 2.4.7. Imaginary literals * 2.5. Operators * 2.6. Delimiters * 3. Data model * 3.1. Objects, values and types * 3.2. The standard type hierarchy * 3.2.1. None * 3.2.2. NotImplemented * 3.2.3. Ellipsis * 3.2.4. "numbers.Number" * 3.2.4.1. "numbers.Integral" * 3.2.4.2. "numbers.Real" ("float") * 3.2.4.3. "numbers.Complex" ("complex") * 3.2.5. Sequences * 3.2.5.1. Immutable sequences * 3.2.5.2. Mutable sequences * 3.2.6. Set types * 3.2.7. Mappings * 3.2.7.1. Dictionaries * 3.2.8. Callable types * 3.2.8.1. User-defined functions * 3.2.8.1.1. Special read-only attributes * 3.2.8.1.2. Special writable attributes * 3.2.8.2. Instance methods * 3.2.8.3. Generator functions * 3.2.8.4. Coroutine functions * 3.2.8.5. Asynchronous generator functions * 3.2.8.6. Built-in functions * 3.2.8.7. Built-in methods * 3.2.8.8. Classes * 3.2.8.9. Class Instances * 3.2.9. Modules * 3.2.9.1. Import-related attributes on module objects * 3.2.9.2. Other writable attributes on module objects * 3.2.9.3. Module dictionaries * 3.2.10. Custom classes * 3.2.10.1. Special attributes * 3.2.10.2. Special methods * 3.2.11. Class instances * 3.2.11.1. Special attributes * 3.2.12. I/O objects (also known as file objects) * 3.2.13. Internal types * 3.2.13.1. Code objects * 3.2.13.1.1. Special read-only attributes * 3.2.13.1.2. Methods on code objects * 3.2.13.2. Frame objects * 3.2.13.2.1. Special read-only attributes * 3.2.13.2.2. Special writable attributes * 3.2.13.2.3. Frame object methods * 3.2.13.3. Traceback objects * 3.2.13.4. Slice objects * 3.2.13.5. Static method objects * 3.2.13.6. Class method objects * 3.3. Special method names * 3.3.1. Basic customization * 3.3.2. Customizing attribute access * 3.3.2.1. Customizing module attribute access * 3.3.2.2. Implementing Descriptors * 3.3.2.3. Invoking Descriptors * 3.3.2.4. __slots__ * 3.3.3. Customizing class creation * 3.3.3.1. Metaclasses * 3.3.3.2. Resolving MRO entries * 3.3.3.3. Determining the appropriate metaclass * 3.3.3.4. Preparing the class namespace * 3.3.3.5. Executing the class body * 3.3.3.6. Creating the class object * 3.3.3.7. Uses for metaclasses * 3.3.4. Customizing instance and subclass checks * 3.3.5. Emulating generic types * 3.3.5.1. The purpose of *__class_getitem__* * 3.3.5.2. *__class_getitem__* versus *__getitem__* * 3.3.6. Emulating callable objects * 3.3.7. Emulating container types * 3.3.8. Emulating numeric types * 3.3.9. With Statement Context Managers * 3.3.10. Customizing positional arguments in class pattern matching * 3.3.11. Emulating buffer types * 3.3.12. Special method lookup * 3.4. Coroutines * 3.4.1. Awaitable Objects * 3.4.2. Coroutine Objects * 3.4.3. Asynchronous Iterators * 3.4.4. Asynchronous Context Managers * 4. Execution model * 4.1. Structure of a program * 4.2. Naming and binding * 4.2.1. Binding of names * 4.2.2. Resolution of names * 4.2.3. Annotation scopes * 4.2.4. Lazy evaluation * 4.2.5. Builtins and restricted execution * 4.2.6. Interaction with dynamic features * 4.3. Exceptions * 5. The import system * 5.1. "importlib" * 5.2. Packages * 5.2.1. Regular packages * 5.2.2. Namespace packages * 5.3. Searching * 5.3.1. The module cache * 5.3.2. Finders and loaders * 5.3.3. Import hooks * 5.3.4. The meta path * 5.4. Loading * 5.4.1. Loaders * 5.4.2. Submodules * 5.4.3. Module specs * 5.4.4. __path__ attributes on modules * 5.4.5. Module reprs * 5.4.6. Cached bytecode invalidation * 5.5. The Path Based Finder * 5.5.1. Path entry finders * 5.5.2. Path entry finder protocol * 5.6. Replacing the standard import system * 5.7. Package Relative Imports * 5.8. Special considerations for __main__ * 5.8.1. __main__.__spec__ * 5.9. References * 6. Expressions * 6.1. Arithmetic conversions * 6.2. Atoms * 6.2.1. Identifiers (Names) * 6.2.1.1. Private name mangling * 6.2.2. Literals * 6.2.3. Parenthesized forms * 6.2.4. Displays for lists, sets and dictionaries * 6.2.5. List displays * 6.2.6. Set displays * 6.2.7. Dictionary displays * 6.2.8. Generator expressions * 6.2.9. Yield expressions * 6.2.9.1. Generator-iterator methods * 6.2.9.2. Examples * 6.2.9.3. Asynchronous generator functions * 6.2.9.4. Asynchronous generator-iterator methods * 6.3. Primaries * 6.3.1. Attribute references * 6.3.2. Subscriptions * 6.3.3. Slicings * 6.3.4. Calls * 6.4. Await expression * 6.5. The power operator * 6.6. Unary arithmetic and bitwise operations * 6.7. Binary arithmetic operations * 6.8. Shifting operations * 6.9. Binary bitwise operations * 6.10. Comparisons * 6.10.1. Value comparisons * 6.10.2. Membership test operations * 6.10.3. Identity comparisons * 6.11. Boolean operations * 6.12. Assignment expressions * 6.13. Conditional expressions * 6.14. Lambdas * 6.15. Expression lists * 6.16. Evaluation order * 6.17. Operator precedence * 7. Simple statements * 7.1. Expression statements * 7.2. Assignment statements * 7.2.1. Augmented assignment statements * 7.2.2. Annotated assignment statements * 7.3. The "assert" statement * 7.4. The "pass" statement * 7.5. The "del" statement * 7.6. The "return" statement * 7.7. The "yield" statement * 7.8. The "raise" statement * 7.9. The "break" statement * 7.10. The "continue" statement * 7.11. The "import" statement * 7.11.1. Future statements * 7.12. The "global" statement * 7.13. The "nonlocal" statement * 7.14. The "type" statement * 8. Compound statements * 8.1. The "if" statement * 8.2. The "while" statement * 8.3. The "for" statement * 8.4. The "try" statement * 8.4.1. "except" clause * 8.4.2. "except*" clause * 8.4.3. "else" clause * 8.4.4. "finally" clause * 8.5. The "with" statement * 8.6. The "match" statement * 8.6.1. Overview * 8.6.2. Guards * 8.6.3. Irrefutable Case Blocks * 8.6.4. Patterns * 8.6.4.1. OR Patterns * 8.6.4.2. AS Patterns * 8.6.4.3. Literal Patterns * 8.6.4.4. Capture Patterns * 8.6.4.5. Wildcard Patterns * 8.6.4.6. Value Patterns * 8.6.4.7. Group Patterns * 8.6.4.8. Sequence Patterns * 8.6.4.9. Mapping Patterns * 8.6.4.10. Class Patterns * 8.7. Function definitions * 8.8. Class definitions * 8.9. Coroutines * 8.9.1. Coroutine function definition * 8.9.2. The "async for" statement * 8.9.3. The "async with" statement * 8.10. Type parameter lists * 8.10.1. Generic functions * 8.10.2. Generic classes * 8.10.3. Generic type aliases * 9. Top-level components * 9.1. Complete Python programs * 9.2. File input * 9.3. Interactive input * 9.4. Expression input * 10. Full Grammar specification * The Python Standard Library * Introduction * Notes on availability * WebAssembly platforms * Mobile platforms * Built-in Functions * Built-in Constants * Constants added by the "site" module * Built-in Types * Truth Value Testing * Boolean Operations — "and", "or", "not" * Comparisons * Numeric Types — "int", "float", "complex" * Bitwise Operations on Integer Types * Additional Methods on Integer Types * Additional Methods on Float * Hashing of numeric types * Boolean Type - "bool" * Iterator Types * Generator Types * Sequence Types — "list", "tuple", "range" * Common Sequence Operations * Immutable Sequence Types * Mutable Sequence Types * Lists * Tuples * Ranges * Text Sequence Type — "str" * String Methods * Formatted String Literals (f-strings) * "printf"-style String Formatting * Binary Sequence Types — "bytes", "bytearray", "memoryview" * Bytes Objects * Bytearray Objects * Bytes and Bytearray Operations * "printf"-style Bytes Formatting * Memory Views * Set Types — "set", "frozenset" * Mapping Types — "dict" * Dictionary view objects * Context Manager Types * Type Annotation Types — *Generic Alias*, *Union* * Generic Alias Type * Standard Generic Classes * Special Attributes of "GenericAlias" objects * Union Type * Other Built-in Types * Modules * Classes and Class Instances * Functions * Methods * Code Objects * Type Objects * The Null Object * The Ellipsis Object * The NotImplemented Object * Internal Objects * Special Attributes * Integer string conversion length limitation * Affected APIs * Configuring the limit * Recommended configuration * Built-in Exceptions * Exception context * Inheriting from built-in exceptions * Base classes * Concrete exceptions * OS exceptions * Warnings * Exception groups * Exception hierarchy * Text Processing Services * "string" — Common string operations * String constants * Custom String Formatting * Format String Syntax * Format Specification Mini-Language * Format examples * Template strings * Helper functions * "re" — Regular expression operations * Regular Expression Syntax * Module Contents * Flags * Functions * Exceptions * Regular Expression Objects * Match Objects * Regular Expression Examples * Checking for a Pair * Simulating scanf() * search() vs. match() * Making a Phonebook * Text Munging * Finding all Adverbs * Finding all Adverbs and their Positions * Raw String Notation * Writing a Tokenizer * "difflib" — Helpers for computing deltas * SequenceMatcher Objects * SequenceMatcher Examples * Differ Objects * Differ Example * A command-line interface to difflib * ndiff example * "textwrap" — Text wrapping and filling * "unicodedata" — Unicode Database * "stringprep" — Internet String Preparation * "readline" — GNU readline interface * Init file * Line buffer * History file * History list * Startup hooks * Completion * Example * "rlcompleter" — Completion function for GNU readline * Binary Data Services * "struct" — Interpret bytes as packed binary data * Functions and Exceptions * Format Strings * Byte Order, Size, and Alignment * Format Characters * Examples * Applications * Native Formats * Standard Formats * Classes * "codecs" — Codec registry and base classes * Codec Base Classes * Error Handlers * Stateless Encoding and Decoding * Incremental Encoding and Decoding * IncrementalEncoder Objects * IncrementalDecoder Objects * Stream Encoding and Decoding * StreamWriter Objects * StreamReader Objects * StreamReaderWriter Objects * StreamRecoder Objects * Encodings and Unicode * Standard Encodings * Python Specific Encodings * Text Encodings * Binary Transforms * Text Transforms * "encodings.idna" — Internationalized Domain Names in Applications * "encodings.mbcs" — Windows ANSI codepage * "encodings.utf_8_sig" — UTF-8 codec with BOM signature * Data Types * "datetime" — Basic date and time types * Aware and Naive Objects * Constants * Available Types * Common Properties * Determining if an Object is Aware or Naive * "timedelta" Objects * Examples of usage: "timedelta" * "date" Objects * Examples of Usage: "date" * "datetime" Objects * Examples of Usage: "datetime" * "time" Objects * Examples of Usage: "time" * "tzinfo" Objects * "timezone" Objects * "strftime()" and "strptime()" Behavior * "strftime()" and "strptime()" Format Codes * Technical Detail * "zoneinfo" — IANA time zone support * Using "ZoneInfo" * Data sources * Configuring the data sources * Compile-time configuration * Environment configuration * Runtime configuration * The "ZoneInfo" class * String representations * Pickle serialization * Functions * Globals * Exceptions and warnings * "calendar" — General calendar-related functions * Command-Line Usage * "collections" — Container datatypes * "ChainMap" objects * "ChainMap" Examples and Recipes * "Counter" objects * "deque" objects * "deque" Recipes * "defaultdict" objects * "defaultdict" Examples * "namedtuple()" Factory Function for Tuples with Named Fields * "OrderedDict" objects * "OrderedDict" Examples and Recipes * "UserDict" objects * "UserList" objects * "UserString" objects * "collections.abc" — Abstract Base Classes for Containers * Collections Abstract Base Classes * Collections Abstract Base Classes – Detailed Descriptions * Examples and Recipes * "heapq" — Heap queue algorithm * Basic Examples * Priority Queue Implementation Notes * Theory * "bisect" — Array bisection algorithm * Performance Notes * Searching Sorted Lists * Examples * "array" — Efficient arrays of numeric values * "weakref" — Weak references * Weak Reference Objects * Example * Finalizer Objects * Comparing finalizers with "__del__()" methods * "types" — Dynamic type creation and names for built-in types * Dynamic Type Creation * Standard Interpreter Types * Additional Utility Classes and Functions * Coroutine Utility Functions * "copy" — Shallow and deep copy operations * "pprint" — Data pretty printer * Functions * PrettyPrinter Objects * Example * "reprlib" — Alternate "repr()" implementation * Repr Objects * Subclassing Repr Objects * "enum" — Support for enumerations * Module Contents * Data Types * Supported "__dunder__" names * Supported "_sunder_" names * Utilities and Decorators * Notes * "graphlib" — Functionality to operate with graph-like structures * Exceptions * Numeric and Mathematical Modules * "numbers" — Numeric abstract base classes * The numeric tower * Notes for type implementers * Adding More Numeric ABCs * Implementing the arithmetic operations * "math" — Mathematical functions * Number-theoretic functions * Floating point arithmetic * Floating point manipulation functions * Power, exponential and logarithmic functions * Summation and product functions * Angular conversion * Trigonometric functions * Hyperbolic functions * Special functions * Constants * "cmath" — Mathematical functions for complex numbers * Conversions to and from polar coordinates * Power and logarithmic functions * Trigonometric functions * Hyperbolic functions * Classification functions * Constants * "decimal" — Decimal fixed-point and floating-point arithmetic * Quick-start tutorial * Decimal objects * Logical operands * Context objects * Constants * Rounding modes * Signals * Floating-point notes * Mitigating round-off error with increased precision * Special values * Working with threads * Recipes * Decimal FAQ * "fractions" — Rational numbers * "random" — Generate pseudo-random numbers * Bookkeeping functions * Functions for bytes * Functions for integers * Functions for sequences * Discrete distributions * Real-valued distributions * Alternative Generator * Notes on Reproducibility * Examples * Recipes * Command-line usage * Command-line example * "statistics" — Mathematical statistics functions * Averages and measures of central location * Measures of spread * Statistics for relations between two inputs * Function details * Exceptions * "NormalDist" objects * Examples and Recipes * Classic probability problems * Monte Carlo inputs for simulations * Approximating binomial distributions * Naive bayesian classifier * Functional Programming Modules * "itertools" — Functions creating iterators for efficient looping * Itertool Functions * Itertools Recipes * "functools" — Higher-order functions and operations on callable objects * "partial" Objects * "operator" — Standard operators as functions * Mapping Operators to Functions * In-place Operators * File and Directory Access * "pathlib" — Object-oriented filesystem paths * Basic use * Exceptions * Pure paths * General properties * Operators * Accessing individual parts * Methods and properties * Concrete paths * Parsing and generating URIs * Expanding and resolving paths * Querying file type and status * Reading and writing files * Reading directories * Creating files and directories * Renaming and deleting * Permissions and ownership * Pattern language * Comparison to the "glob" module * Comparison to the "os" and "os.path" modules * Corresponding tools * "os.path" — Common pathname manipulations * "stat" — Interpreting "stat()" results * "filecmp" — File and Directory Comparisons * The "dircmp" class * "tempfile" — Generate temporary files and directories * Examples * Deprecated functions and variables * "glob" — Unix style pathname pattern expansion * Examples * "fnmatch" — Unix filename pattern matching * "linecache" — Random access to text lines * "shutil" — High-level file operations * Directory and files operations * Platform-dependent efficient copy operations * copytree example * rmtree example * Archiving operations * Archiving example * Archiving example with *base_dir* * Querying the size of the output terminal * Data Persistence * "pickle" — Python object serialization * Relationship to other Python modules * Comparison with "marshal" * Comparison with "json" * Data stream format * Module Interface * What can be pickled and unpickled? * Pickling Class Instances * Persistence of External Objects * Dispatch Tables * Handling Stateful Objects * Custom Reduction for Types, Functions, and Other Objects * Out-of-band Buffers * Provider API * Consumer API * Example * Restricting Globals * Performance * Examples * "copyreg" — Register "pickle" support functions * Example * "shelve" — Python object persistence * Restrictions * Example * "marshal" — Internal Python object serialization * "dbm" — Interfaces to Unix “databases” * "dbm.sqlite3" — SQLite backend for dbm * "dbm.gnu" — GNU database manager * "dbm.ndbm" — New Database Manager * "dbm.dumb" — Portable DBM implementation * "sqlite3" — DB-API 2.0 interface for SQLite databases * Tutorial * Reference * Module functions * Module constants * Connection objects * Cursor objects * Row objects * Blob objects * PrepareProtocol objects * Exceptions * SQLite and Python types * Default adapters and converters (deprecated) * Command-line interface * How-to guides * How to use placeholders to bind values in SQL queries * How to adapt custom Python types to SQLite values * How to write adaptable objects * How to register adapter callables * How to convert SQLite values to custom Python types * Adapter and converter recipes * How to use connection shortcut methods * How to use the connection context manager * How to work with SQLite URIs * How to create and use row factories * How to handle non-UTF-8 text encodings * Explanation * Transaction control * Transaction control via the "autocommit" attribute * Transaction control via the "isolation_level" attribute * Data Compression and Archiving * "zlib" — Compression compatible with **gzip** * "gzip" — Support for **gzip** files * Examples of usage * Command Line Interface * Command line options * "bz2" — Support for **bzip2** compression * (De)compression of files * Incremental (de)compression * One-shot (de)compression * Examples of usage * "lzma" — Compression using the LZMA algorithm * Reading and writing compressed files * Compressing and decompressing data in memory * Miscellaneous * Specifying custom filter chains * Examples * "zipfile" — Work with ZIP archives * ZipFile Objects * Path Objects * PyZipFile Objects * ZipInfo Objects * Command-Line Interface * Command-line options * Decompression pitfalls * From file itself * File System limitations * Resources limitations * Interruption * Default behaviors of extraction * "tarfile" — Read and write tar archive files * TarFile Objects * TarInfo Objects * Extraction filters * Default named filters * Filter errors * Hints for further verification * Supporting older Python versions * Stateful extraction filter example * Command-Line Interface * Command-line options * Examples * Supported tar formats * Unicode issues * File Formats * "csv" — CSV File Reading and Writing * Module Contents * Dialects and Formatting Parameters * Reader Objects * Writer Objects * Examples * "configparser" — Configuration file parser * Quick Start * Supported Datatypes * Fallback Values * Supported INI File Structure * Unnamed Sections * Interpolation of values * Mapping Protocol Access * Customizing Parser Behaviour * Legacy API Examples * ConfigParser Objects * RawConfigParser Objects * Exceptions * "tomllib" — Parse TOML files * Examples * Conversion Table * "netrc" — netrc file processing * netrc Objects * "plistlib" — Generate and parse Apple ".plist" files * Examples * Cryptographic Services * "hashlib" — Secure hashes and message digests * Hash algorithms * Usage * Constructors * Attributes * Hash Objects * SHAKE variable length digests * File hashing * Key derivation * BLAKE2 * Creating hash objects * Constants * Examples * Simple hashing * Using different digest sizes * Keyed hashing * Randomized hashing * Personalization * Tree mode * Credits * "hmac" — Keyed-Hashing for Message Authentication * "secrets" — Generate secure random numbers for managing secrets * Random numbers * Generating tokens * How many bytes should tokens use? * Other functions * Recipes and best practices * Generic Operating System Services * "os" — Miscellaneous operating system interfaces * File Names, Command Line Arguments, and Environment Variables * Python UTF-8 Mode * Process Parameters * File Object Creation * File Descriptor Operations * Querying the size of a terminal * Inheritance of File Descriptors * Files and Directories * Timer File Descriptors * Linux extended attributes * Process Management * Interface to the scheduler * Miscellaneous System Information * Random numbers * "io" — Core tools for working with streams * Overview * Text I/O * Binary I/O * Raw I/O * Text Encoding * Opt-in EncodingWarning * High-level Module Interface * Class hierarchy * I/O Base Classes * Raw File I/O * Buffered Streams * Text I/O * Performance * Binary I/O * Text I/O * Multi-threading * Reentrancy * "time" — Time access and conversions * Functions * Clock ID Constants * Timezone Constants * "logging" — Logging facility for Python * Logger Objects * Logging Levels * Handler Objects * Formatter Objects * Filter Objects * LogRecord Objects * LogRecord attributes * LoggerAdapter Objects * Thread Safety * Module-Level Functions * Module-Level Attributes * Integration with the warnings module * "logging.config" — Logging configuration * Configuration functions * Security considerations * Configuration dictionary schema * Dictionary Schema Details * Incremental Configuration * Object connections * User-defined objects * Handler configuration order * Access to external objects * Access to internal objects * Import resolution and custom importers * Configuring QueueHandler and QueueListener * Configuration file format * "logging.handlers" — Logging handlers * StreamHandler * FileHandler * NullHandler * WatchedFileHandler * BaseRotatingHandler * RotatingFileHandler * TimedRotatingFileHandler * SocketHandler * DatagramHandler * SysLogHandler * NTEventLogHandler * SMTPHandler * MemoryHandler * HTTPHandler * QueueHandler * QueueListener * "platform" — Access to underlying platform’s identifying data * Cross platform * Java platform * Windows platform * macOS platform * iOS platform * Unix platforms * Linux platforms * Android platform * Command-line usage * "errno" — Standard errno system symbols * "ctypes" — A foreign function library for Python * ctypes tutorial * Loading dynamic link libraries * Accessing functions from loaded dlls * Calling functions * Fundamental data types * Calling functions, continued * Calling variadic functions * Calling functions with your own custom data types * Specifying the required argument types (function prototypes) * Return types * Passing pointers (or: passing parameters by reference) * Structures and unions * Structure/union alignment and byte order * Bit fields in structures and unions * Arrays * Pointers * Type conversions * Incomplete Types * Callback functions * Accessing values exported from dlls * Surprises * Variable-sized data types * ctypes reference * Finding shared libraries * Loading shared libraries * Foreign functions * Function prototypes * Utility functions * Data types * Fundamental data types * Structured data types * Arrays and pointers * Command Line Interface Libraries * "argparse" — Parser for command-line options, arguments and subcommands * ArgumentParser objects * prog * usage * description * epilog * parents * formatter_class * prefix_chars * fromfile_prefix_chars * argument_default * allow_abbrev * conflict_handler * add_help * exit_on_error * The add_argument() method * name or flags * action * nargs * const * default * type * choices * required * help * metavar * dest * deprecated * Action classes * The parse_args() method * Option value syntax * Invalid arguments * Arguments containing "-" * Argument abbreviations (prefix matching) * Beyond "sys.argv" * The Namespace object * Other utilities * Sub-commands * FileType objects * Argument groups * Mutual exclusion * Parser defaults * Printing help * Partial parsing * Customizing file parsing * Exiting methods * Intermixed parsing * Registering custom types or actions * Exceptions * Argparse Tutorial * Concepts * The basics * Introducing Positional arguments * Introducing Optional arguments * Short options * Combining Positional and Optional arguments * Getting a little more advanced * Specifying ambiguous arguments * Conflicting options * How to translate the argparse output * Custom type converters * Conclusion * Migrating "optparse" code to "argparse" * "optparse" — Parser for command line options * Choosing an argument parsing library * Introduction * Background * Terminology * What are options for? * What are positional arguments for? * Tutorial * Understanding option actions * The store action * Handling boolean (flag) options * Other actions * Default values * Generating help * Grouping Options * Printing a version string * How "optparse" handles errors * Putting it all together * Reference Guide * Creating the parser * Populating the parser * Defining options * Option attributes * Standard option actions * Standard option types * Parsing arguments * Querying and manipulating your option parser * Conflicts between options * Cleanup * Other methods * Option Callbacks * Defining a callback option * How callbacks are called * Raising errors in a callback * Callback example 1: trivial callback * Callback example 2: check option order * Callback example 3: check option order (generalized) * Callback example 4: check arbitrary condition * Callback example 5: fixed arguments * Callback example 6: variable arguments * Extending "optparse" * Adding new types * Adding new actions * Exceptions * "getpass" — Portable password input * "fileinput" — Iterate over lines from multiple input streams * "curses" — Terminal handling for character-cell displays * Functions * Window Objects * Constants * "curses.textpad" — Text input widget for curses programs * Textbox objects * "curses.ascii" — Utilities for ASCII characters * "curses.panel" — A panel stack extension for curses * Functions * Panel Objects * Concurrent Execution * "threading" — Thread-based parallelism * Introduction * GIL and performance considerations * Reference * Thread-local data * Thread objects * Lock objects * RLock objects * Condition objects * Semaphore objects * "Semaphore" example * Event objects * Timer objects * Barrier objects * Using locks, conditions, and semaphores in the "with" statement * "multiprocessing" — Process-based parallelism * Introduction * The "Process" class * Contexts and start methods * Exchanging objects between processes * Synchronization between processes * Sharing state between processes * Using a pool of workers * Reference * "Process" and exceptions * Pipes and Queues * Miscellaneous * Connection Objects * Synchronization primitives * Shared "ctypes" Objects * The "multiprocessing.sharedctypes" module * Managers * Customized managers * Using a remote manager * Proxy Objects * Cleanup * Process Pools * Listeners and Clients * Address Formats * Authentication keys * Logging * The "multiprocessing.dummy" module * Programming guidelines * All start methods * The *spawn* and *forkserver* start methods * Examples * "multiprocessing.shared_memory" — Shared memory for direct access across processes * The "concurrent" package * "concurrent.futures" — Launching parallel tasks * Executor Objects * ThreadPoolExecutor * ThreadPoolExecutor Example * ProcessPoolExecutor * ProcessPoolExecutor Example * Future Objects * Module Functions * Exception classes * "subprocess" — Subprocess management * Using the "subprocess" Module * Frequently Used Arguments * Popen Constructor * Exceptions * Security Considerations * Popen Objects * Windows Popen Helpers * Windows Constants * Older high-level API * Replacing Older Functions with the "subprocess" Module * Replacing **/bin/sh** shell command substitution * Replacing shell pipeline * Replacing "os.system()" * Replacing the "os.spawn" family * Replacing "os.popen()", "os.popen2()", "os.popen3()" * Replacing functions from the "popen2" module * Legacy Shell Invocation Functions * Notes * Timeout Behavior * Converting an argument sequence to a string on Windows * Disabling use of "vfork()" or "posix_spawn()" * "sched" — Event scheduler * Scheduler Objects * "queue" — A synchronized queue class * Queue Objects * Terminating queues * SimpleQueue Objects * "contextvars" — Context Variables * Context Variables * Manual Context Management * asyncio support * "_thread" — Low-level threading API * Networking and Interprocess Communication * "asyncio" — Asynchronous I/O * Runners * Running an asyncio Program * Runner context manager * Handling Keyboard Interruption * Coroutines and Tasks * Coroutines * Awaitables * Creating Tasks * Task Cancellation * Task Groups * Terminating a Task Group * Sleeping * Running Tasks Concurrently * Eager Task Factory * Shielding From Cancellation * Timeouts * Waiting Primitives * Running in Threads * Scheduling From Other Threads * Introspection * Task Object * Streams * StreamReader * StreamWriter * Examples * TCP echo client using streams * TCP echo server using streams * Get HTTP headers * Register an open socket to wait for data using streams * Synchronization Primitives * Lock * Event * Condition * Semaphore * BoundedSemaphore * Barrier * Subprocesses * Creating Subprocesses * Constants * Interacting with Subprocesses * Subprocess and Threads * Examples * Queues * Queue * Priority Queue * LIFO Queue * Exceptions * Examples * Exceptions * Event Loop * Event Loop Methods * Running and stopping the loop * Scheduling callbacks * Scheduling delayed callbacks * Creating Futures and Tasks * Opening network connections * Creating network servers * Transferring files * TLS Upgrade * Watching file descriptors * Working with socket objects directly * DNS * Working with pipes * Unix signals * Executing code in thread or process pools * Error Handling API * Enabling debug mode * Running Subprocesses * Callback Handles * Server Objects * Event Loop Implementations * Examples * Hello World with call_soon() * Display the current date with call_later() * Watch a file descriptor for read events * Set signal handlers for SIGINT and SIGTERM * Futures * Future Functions * Future Object * Transports and Protocols * Transports * Transports Hierarchy * Base Transport * Read-only Transports * Write-only Transports * Datagram Transports * Subprocess Transports * Protocols * Base Protocols * Base Protocol * Streaming Protocols * Buffered Streaming Protocols * Datagram Protocols * Subprocess Protocols * Examples * TCP Echo Server * TCP Echo Client * UDP Echo Server * UDP Echo Client * Connecting Existing Sockets * loop.subprocess_exec() and SubprocessProtocol * Policies * Getting and Setting the Policy * Policy Objects * Process Watchers * Custom Policies * Platform Support * All Platforms * Windows * Subprocess Support on Windows * macOS * Extending * Writing a Custom Event Loop * Future and Task private constructors * Task lifetime support * High-level API Index * Tasks * Queues * Subprocesses * Streams * Synchronization * Exceptions * Low-level API Index * Obtaining the Event Loop * Event Loop Methods * Transports * Protocols * Event Loop Policies * Developing with asyncio * Debug Mode * Concurrency and Multithreading * Running Blocking Code * Logging * Detect never-awaited coroutines * Detect never-retrieved exceptions * "socket" — Low-level networking interface * Socket families * Module contents * Exceptions * Constants * Functions * Creating sockets * Other functions * Socket Objects * Notes on socket timeouts * Timeouts and the "connect" method * Timeouts and the "accept" method * Example * "ssl" — TLS/SSL wrapper for socket objects * Functions, Constants, and Exceptions * Socket creation * Context creation * Exceptions * Random generation * Certificate handling * Constants * SSL Sockets * SSL Contexts * Certificates * Certificate chains * CA certificates * Combined key and certificate * Self-signed certificates * Examples * Testing for SSL support * Client-side operation * Server-side operation * Notes on non-blocking sockets * Memory BIO Support * SSL session * Security considerations * Best defaults * Manual settings * Verifying certificates * Protocol versions * Cipher selection * Multi-processing * TLS 1.3 * "select" — Waiting for I/O completion * "/dev/poll" Polling Objects * Edge and Level Trigger Polling (epoll) Objects * Polling Objects * Kqueue Objects * Kevent Objects * "selectors" — High-level I/O multiplexing * Introduction * Classes * Examples * "signal" — Set handlers for asynchronous events * General rules * Execution of Python signal handlers * Signals and threads * Module contents * Examples * Note on SIGPIPE * Note on Signal Handlers and Exceptions * "mmap" — Memory-mapped file support * MADV_* Constants * MAP_* Constants * Internet Data Handling * "email" — An email and MIME handling package * "email.message": Representing an email message * "email.parser": Parsing email messages * FeedParser API * Parser API * Additional notes * "email.generator": Generating MIME documents * "email.policy": Policy Objects * "email.errors": Exception and Defect classes * "email.headerregistry": Custom Header Objects * "email.contentmanager": Managing MIME Content * Content Manager Instances * "email": Examples * "email.message.Message": Representing an email message using the "compat32" API * "email.mime": Creating email and MIME objects from scratch * "email.header": Internationalized headers * "email.charset": Representing character sets * "email.encoders": Encoders * "email.utils": Miscellaneous utilities * "email.iterators": Iterators * "json" — JSON encoder and decoder * Basic Usage * Encoders and Decoders * Exceptions * Standard Compliance and Interoperability * Character Encodings * Infinite and NaN Number Values * Repeated Names Within an Object * Top-level Non-Object, Non-Array Values * Implementation Limitations * Command Line Interface * Command line options * "mailbox" — Manipulate mailboxes in various formats * "Mailbox" objects * "Maildir" objects * "mbox" objects * "MH" objects * "Babyl" objects * "MMDF" objects * "Message" objects * "MaildirMessage" objects * "mboxMessage" objects * "MHMessage" objects * "BabylMessage" objects * "MMDFMessage" objects * Exceptions * Examples * "mimetypes" — Map filenames to MIME types * MimeTypes Objects * "base64" — Base16, Base32, Base64, Base85 Data Encodings * RFC 4648 Encodings * Base85 Encodings * Legacy Interface * Security Considerations * "binascii" — Convert between binary and ASCII * "quopri" — Encode and decode MIME quoted-printable data * Structured Markup Processing Tools * "html" — HyperText Markup Language support * "html.parser" — Simple HTML and XHTML parser * Example HTML Parser Application * "HTMLParser" Methods * Examples * "html.entities" — Definitions of HTML general entities * XML Processing Modules * XML vulnerabilities * The "defusedxml" Package * "xml.etree.ElementTree" — The ElementTree XML API * Tutorial * XML tree and elements * Parsing XML * Pull API for non-blocking parsing * Finding interesting elements * Modifying an XML File * Building XML documents * Parsing XML with Namespaces * XPath support * Example * Supported XPath syntax * Reference * Functions * XInclude support * Example * Reference * Functions * Element Objects * ElementTree Objects * QName Objects * TreeBuilder Objects * XMLParser Objects * XMLPullParser Objects * Exceptions * "xml.dom" — The Document Object Model API * Module Contents * Objects in the DOM * DOMImplementation Objects * Node Objects * NodeList Objects * DocumentType Objects * Document Objects * Element Objects * Attr Objects * NamedNodeMap Objects * Comment Objects * Text and CDATASection Objects * ProcessingInstruction Objects * Exceptions * Conformance * Type Mapping * Accessor Methods * "xml.dom.minidom" — Minimal DOM implementation * DOM Objects * DOM Example * minidom and the DOM standard * "xml.dom.pulldom" — Support for building partial DOM trees * DOMEventStream Objects * "xml.sax" — Support for SAX2 parsers * SAXException Objects * "xml.sax.handler" — Base classes for SAX handlers * ContentHandler Objects * DTDHandler Objects * EntityResolver Objects * ErrorHandler Objects * LexicalHandler Objects * "xml.sax.saxutils" — SAX Utilities * "xml.sax.xmlreader" — Interface for XML parsers * XMLReader Objects * IncrementalParser Objects * Locator Objects * InputSource Objects * The "Attributes" Interface * The "AttributesNS" Interface * "xml.parsers.expat" — Fast XML parsing using Expat * XMLParser Objects * ExpatError Exceptions * Example * Content Model Descriptions * Expat error constants * Internet Protocols and Support * "webbrowser" — Convenient web-browser controller * Browser Controller Objects * "wsgiref" — WSGI Utilities and Reference Implementation * "wsgiref.util" – WSGI environment utilities * "wsgiref.headers" – WSGI response header tools * "wsgiref.simple_server" – a simple WSGI HTTP server * "wsgiref.validate" — WSGI conformance checker * "wsgiref.handlers" – server/gateway base classes * "wsgiref.types" – WSGI types for static type checking * Examples * "urllib" — URL handling modules * "urllib.request" — Extensible library for opening URLs * Request Objects * OpenerDirector Objects * BaseHandler Objects * HTTPRedirectHandler Objects * HTTPCookieProcessor Objects * ProxyHandler Objects * HTTPPasswordMgr Objects * HTTPPasswordMgrWithPriorAuth Objects * AbstractBasicAuthHandler Objects * HTTPBasicAuthHandler Objects * ProxyBasicAuthHandler Objects * AbstractDigestAuthHandler Objects * HTTPDigestAuthHandler Objects * ProxyDigestAuthHandler Objects * HTTPHandler Objects * HTTPSHandler Objects * FileHandler Objects * DataHandler Objects * FTPHandler Objects * CacheFTPHandler Objects * UnknownHandler Objects * HTTPErrorProcessor Objects * Examples * Legacy interface * "urllib.request" Restrictions * "urllib.response" — Response classes used by urllib * "urllib.parse" — Parse URLs into components * URL Parsing * URL parsing security * Parsing ASCII Encoded Bytes * Structured Parse Results * URL Quoting * "urllib.error" — Exception classes raised by urllib.request * "urllib.robotparser" — Parser for robots.txt * "http" — HTTP modules * HTTP status codes * HTTP status category * HTTP methods * "http.client" — HTTP protocol client * HTTPConnection Objects * HTTPResponse Objects * Examples * HTTPMessage Objects * "ftplib" — FTP protocol client * Reference * FTP objects * FTP_TLS objects * Module variables * "poplib" — POP3 protocol client * POP3 Objects * POP3 Example * "imaplib" — IMAP4 protocol client * IMAP4 Objects * IMAP4 Example * "smtplib" — SMTP protocol client * SMTP Objects * SMTP Example * "uuid" — UUID objects according to **RFC 4122** * Command-Line Usage * Example * Command-Line Example * "socketserver" — A framework for network servers * Server Creation Notes * Server Objects * Request Handler Objects * Examples * "socketserver.TCPServer" Example * "socketserver.UDPServer" Example * Asynchronous Mixins * "http.server" — HTTP servers * Command-line interface * Security considerations * "http.cookies" — HTTP state management * Cookie Objects * Morsel Objects * Example * "http.cookiejar" — Cookie handling for HTTP clients * CookieJar and FileCookieJar Objects * FileCookieJar subclasses and co-operation with web browsers * CookiePolicy Objects * DefaultCookiePolicy Objects * Cookie Objects * Examples * "xmlrpc" — XMLRPC server and client modules * "xmlrpc.client" — XML-RPC client access * ServerProxy Objects * DateTime Objects * Binary Objects * Fault Objects * ProtocolError Objects * MultiCall Objects * Convenience Functions * Example of Client Usage * Example of Client and Server Usage * "xmlrpc.server" — Basic XML-RPC servers * SimpleXMLRPCServer Objects * SimpleXMLRPCServer Example * CGIXMLRPCRequestHandler * Documenting XMLRPC server * DocXMLRPCServer Objects * DocCGIXMLRPCRequestHandler * "ipaddress" — IPv4/IPv6 manipulation library * Convenience factory functions * IP Addresses * Address objects * Conversion to Strings and Integers * Operators * Comparison operators * Arithmetic operators * IP Network definitions * Prefix, net mask and host mask * Network objects * Operators * Logical operators * Iteration * Networks as containers of addresses * Interface objects * Operators * Logical operators * Other Module Level Functions * Custom Exceptions * Multimedia Services * "wave" — Read and write WAV files * Wave_read Objects * Wave_write Objects * "colorsys" — Conversions between color systems * Internationalization * "gettext" — Multilingual internationalization services * GNU **gettext** API * Class-based API * The "NullTranslations" class * The "GNUTranslations" class * Solaris message catalog support * The Catalog constructor * Internationalizing your programs and modules * Localizing your module * Localizing your application * Changing languages on the fly * Deferred translations * Acknowledgements * "locale" — Internationalization services * Background, details, hints, tips and caveats * For extension writers and programs that embed Python * Access to message catalogs * Program Frameworks * "turtle" — Turtle graphics * Introduction * Get started * Tutorial * Starting a turtle environment * Basic drawing * Pen control * The turtle’s position * Making algorithmic patterns * How to… * Get started as quickly as possible * Use the "turtle" module namespace * Use turtle graphics in a script * Use object-oriented turtle graphics * Turtle graphics reference * Turtle methods * Methods of TurtleScreen/Screen * Methods of RawTurtle/Turtle and corresponding functions * Turtle motion * Tell Turtle’s state * Settings for measurement * Pen control * Drawing state * Color control * Filling * More drawing control * Turtle state * Visibility * Appearance * Using events * Special Turtle methods * Compound shapes * Methods of TurtleScreen/Screen and corresponding functions * Window control * Animation control * Using screen events * Input methods * Settings and special methods * Methods specific to Screen, not inherited from TurtleScreen * Public classes * Explanation * Help and configuration * How to use help * Translation of docstrings into different languages * How to configure Screen and Turtles * "turtledemo" — Demo scripts * Changes since Python 2.6 * Changes since Python 3.0 * "cmd" — Support for line-oriented command interpreters * Cmd Objects * Cmd Example * "shlex" — Simple lexical analysis * shlex Objects * Parsing Rules * Improved Compatibility with Shells * Graphical User Interfaces with Tk * "tkinter" — Python interface to Tcl/Tk * Architecture * Tkinter Modules * Tkinter Life Preserver * A Hello World Program * Important Tk Concepts * Understanding How Tkinter Wraps Tcl/Tk * How do I…? What option does…? * Navigating the Tcl/Tk Reference Manual * Threading model * Handy Reference * Setting Options * The Packer * Packer Options * Coupling Widget Variables * The Window Manager * Tk Option Data Types * Bindings and Events * The index Parameter * Images * File Handlers * "tkinter.colorchooser" — Color choosing dialog * "tkinter.font" — Tkinter font wrapper * Tkinter Dialogs * "tkinter.simpledialog" — Standard Tkinter input dialogs * "tkinter.filedialog" — File selection dialogs * Native Load/Save Dialogs * "tkinter.commondialog" — Dialog window templates * "tkinter.messagebox" — Tkinter message prompts * "tkinter.scrolledtext" — Scrolled Text Widget * "tkinter.dnd" — Drag and drop support * "tkinter.ttk" — Tk themed widgets * Using Ttk * Ttk Widgets * Widget * Standard Options * Scrollable Widget Options * Label Options * Compatibility Options * Widget States * ttk.Widget * Combobox * Options * Virtual events * ttk.Combobox * Spinbox * Options * Virtual events * ttk.Spinbox * Notebook * Options * Tab Options * Tab Identifiers * Virtual Events * ttk.Notebook * Progressbar * Options * ttk.Progressbar * Separator * Options * Sizegrip * Platform-specific notes * Bugs * Treeview * Options * Item Options * Tag Options * Column Identifiers * Virtual Events * ttk.Treeview * Ttk Styling * Layouts * IDLE — Python editor and shell * Menus * File menu (Shell and Editor) * Edit menu (Shell and Editor) * Format menu (Editor window only) * Run menu (Editor window only) * Shell menu (Shell window only) * Debug menu (Shell window only) * Options menu (Shell and Editor) * Window menu (Shell and Editor) * Help menu (Shell and Editor) * Context menus * Editing and Navigation * Editor windows * Key bindings * Automatic indentation * Search and Replace * Completions * Calltips * Code Context * Shell window * Text colors * Startup and Code Execution * Command line usage * Startup failure * Running user code * User output in Shell * Developing tkinter applications * Running without a subprocess * Help and Preferences * Help sources * Setting preferences * IDLE on macOS * Extensions * idlelib — implementation of IDLE application * Development Tools * "typing" — Support for type hints * Specification for the Python Type System * Type aliases * NewType * Annotating callable objects * Generics * Annotating tuples * The type of class objects * Annotating generators and coroutines * User-defined generic types * The "Any" type * Nominal vs structural subtyping * Module contents * Special typing primitives * Special types * Special forms * Building generic types and type aliases * Other special directives * Protocols * ABCs for working with IO * Functions and decorators * Introspection helpers * Constant * Deprecated aliases * Aliases to built-in types * Aliases to types in "collections" * Aliases to other concrete types * Aliases to container ABCs in "collections.abc" * Aliases to asynchronous ABCs in "collections.abc" * Aliases to other ABCs in "collections.abc" * Aliases to "contextlib" ABCs * Deprecation Timeline of Major Features * "pydoc" — Documentation generator and online help system * Python Development Mode * Effects of the Python Development Mode * ResourceWarning Example * Bad file descriptor error example * "doctest" — Test interactive Python examples * Simple Usage: Checking Examples in Docstrings * Simple Usage: Checking Examples in a Text File * Command-line Usage * How It Works * Which Docstrings Are Examined? * How are Docstring Examples Recognized? * What’s the Execution Context? * What About Exceptions? * Option Flags * Directives * Warnings * Basic API * Unittest API * Advanced API * DocTest Objects * Example Objects * DocTestFinder objects * DocTestParser objects * TestResults objects * DocTestRunner objects * OutputChecker objects * Debugging * Soapbox * "unittest" — Unit testing framework * Basic example * Command-Line Interface * Command-line options * Test Discovery * Organizing test code * Re-using old test code * Skipping tests and expected failures * Distinguishing test iterations using subtests * Classes and functions * Test cases * Grouping tests * Loading and running tests * load_tests Protocol * Class and Module Fixtures * setUpClass and tearDownClass * setUpModule and tearDownModule * Signal Handling * "unittest.mock" — mock object library * Quick Guide * The Mock Class * Calling * Deleting Attributes * Mock names and the name attribute * Attaching Mocks as Attributes * The patchers * patch * patch.object * patch.dict * patch.multiple * patch methods: start and stop * patch builtins * TEST_PREFIX * Nesting Patch Decorators * Where to patch * Patching Descriptors and Proxy Objects * MagicMock and magic method support * Mocking Magic Methods * Magic Mock * Helpers * sentinel * DEFAULT * call * create_autospec * ANY * FILTER_DIR * mock_open * Autospeccing * Sealing mocks * Order of precedence of "side_effect", "return_value" and *wraps* * "unittest.mock" — getting started * Using Mock * Mock Patching Methods * Mock for Method Calls on an Object * Mocking Classes * Naming your mocks * Tracking all Calls * Setting Return Values and Attributes * Raising exceptions with mocks * Side effect functions and iterables * Mocking asynchronous iterators * Mocking asynchronous context manager * Creating a Mock from an Existing Object * Using side_effect to return per file content * Patch Decorators * Further Examples * Mocking chained calls * Partial mocking * Mocking a Generator Method * Applying the same patch to every test method * Mocking Unbound Methods * Checking multiple calls with mock * Coping with mutable arguments * Nesting Patches * Mocking a dictionary with MagicMock * Mock subclasses and their attributes * Mocking imports with patch.dict * Tracking order of calls and less verbose call assertions * More complex argument matching * "test" — Regression tests package for Python * Writing Unit Tests for the "test" package * Running tests using the command-line interface * "test.support" — Utilities for the Python test suite * "test.support.socket_helper" — Utilities for socket tests * "test.support.script_helper" — Utilities for the Python execution tests * "test.support.bytecode_helper" — Support tools for testing correct bytecode generation * "test.support.threading_helper" — Utilities for threading tests * "test.support.os_helper" — Utilities for os tests * "test.support.import_helper" — Utilities for import tests * "test.support.warnings_helper" — Utilities for warnings tests * Debugging and Profiling * Audit events table * "bdb" — Debugger framework * "faulthandler" — Dump the Python traceback * Dumping the traceback * Fault handler state * Dumping the tracebacks after a timeout * Dumping the traceback on a user signal * Issue with file descriptors * Example * "pdb" — The Python Debugger * Debugger Commands * The Python Profilers * Introduction to the profilers * Instant User’s Manual * "profile" and "cProfile" Module Reference * The "Stats" Class * What Is Deterministic Profiling? * Limitations * Calibration * Using a custom timer * "timeit" — Measure execution time of small code snippets * Basic Examples * Python Interface * Command-Line Interface * Examples * "trace" — Trace or track Python statement execution * Command-Line Usage * Main options * Modifiers * Filters * Programmatic Interface * "tracemalloc" — Trace memory allocations * Examples * Display the top 10 * Compute differences * Get the traceback of a memory block * Pretty top * Record the current and peak size of all traced memory blocks * API * Functions * DomainFilter * Filter * Frame * Snapshot * Statistic * StatisticDiff * Trace * Traceback * Software Packaging and Distribution * "ensurepip" — Bootstrapping the "pip" installer * Command line interface * Module API * "venv" — Creation of virtual environments * Creating virtual environments * How venvs work * API * An example of extending "EnvBuilder" * "zipapp" — Manage executable Python zip archives * Basic Example * Command-Line Interface * Python API * Examples * Specifying the Interpreter * Creating Standalone Applications with zipapp * Caveats * The Python Zip Application Archive Format * Python Runtime Services * "sys" — System-specific parameters and functions * "sys.monitoring" — Execution event monitoring * Tool identifiers * Registering and using tools * Events * Local events * Ancillary events * Other events * The STOP_ITERATION event * Turning events on and off * Setting events globally * Per code object events * Disabling events * Registering callback functions * Callback function arguments * "sysconfig" — Provide access to Python’s configuration information * Configuration variables * Installation paths * User scheme * "posix_user" * "nt_user" * "osx_framework_user" * Home scheme * "posix_home" * Prefix scheme * "posix_prefix" * "nt" * Installation path functions * Other functions * Command-line usage * "builtins" — Built-in objects * "__main__" — Top-level code environment * "__name__ == '__main__'" * What is the “top-level code environment”? * Idiomatic Usage * Packaging Considerations * "__main__.py" in Python Packages * Idiomatic Usage * "import __main__" * "warnings" — Warning control * Warning Categories * The Warnings Filter * Repeated Warning Suppression Criteria * Describing Warning Filters * Default Warning Filter * Overriding the default filter * Temporarily Suppressing Warnings * Testing Warnings * Updating Code For New Versions of Dependencies * Available Functions * Available Context Managers * "dataclasses" — Data Classes * Module contents * Post-init processing * Class variables * Init-only variables * Frozen instances * Inheritance * Re-ordering of keyword-only parameters in "__init__()" * Default factory functions * Mutable default values * Descriptor-typed fields * "contextlib" — Utilities for "with"-statement contexts * Utilities * Examples and Recipes * Supporting a variable number of context managers * Catching exceptions from "__enter__" methods * Cleaning up in an "__enter__" implementation * Replacing any use of "try-finally" and flag variables * Using a context manager as a function decorator * Single use, reusable and reentrant context managers * Reentrant context managers * Reusable context managers * "abc" — Abstract Base Classes * "atexit" — Exit handlers * "atexit" Example * "traceback" — Print or retrieve a stack traceback * Module-Level Functions * "TracebackException" Objects * "StackSummary" Objects * "FrameSummary" Objects * Examples of Using the Module-Level Functions * Examples of Using "TracebackException" * "__future__" — Future statement definitions * Module Contents * "gc" — Garbage Collector interface * "inspect" — Inspect live objects * Types and members * Retrieving source code * Introspecting callables with the Signature object * Classes and functions * The interpreter stack * Fetching attributes statically * Current State of Generators, Coroutines, and Asynchronous Generators * Code Objects Bit Flags * Buffer flags * Command Line Interface * "site" — Site-specific configuration hook * "sitecustomize" * "usercustomize" * Readline configuration * Module contents * Command Line Interface * Custom Python Interpreters * "code" — Interpreter base classes * Interactive Interpreter Objects * Interactive Console Objects * "codeop" — Compile Python code * Importing Modules * "zipimport" — Import modules from Zip archives * zipimporter Objects * Examples * "pkgutil" — Package extension utility * "modulefinder" — Find modules used by a script * Example usage of "ModuleFinder" * "runpy" — Locating and executing Python modules * "importlib" — The implementation of "import" * Introduction * Functions * "importlib.abc" – Abstract base classes related to import * "importlib.machinery" – Importers and path hooks * "importlib.util" – Utility code for importers * Examples * Importing programmatically * Checking if a module can be imported * Importing a source file directly * Implementing lazy imports * Setting up an importer * Approximating "importlib.import_module()" * "importlib.resources" – Package resource reading, opening and access * Functional API * "importlib.resources.abc" – Abstract base classes for resources * "importlib.metadata" – Accessing package metadata * Overview * Functional API * Entry points * Distribution metadata * Distribution versions * Distribution files * Distribution requirements * Mapping import to distribution packages * Distributions * Distribution Discovery * Extending the search algorithm * Example * The initialization of the "sys.path" module search path * Virtual environments * _pth files * Embedded Python * Python Language Services * "ast" — Abstract Syntax Trees * Abstract Grammar * Node classes * Root nodes * Literals * Variables * Expressions * Subscripting * Comprehensions * Statements * Imports * Control flow * Pattern matching * Type annotations * Type parameters * Function and class definitions * Async and await * "ast" Helpers * Compiler Flags * Command-Line Usage * "symtable" — Access to the compiler’s symbol tables * Generating Symbol Tables * Examining Symbol Tables * Command-Line Usage * "token" — Constants used with Python parse trees * "keyword" — Testing for Python keywords * "tokenize" — Tokenizer for Python source * Tokenizing Input * Command-Line Usage * Examples * "tabnanny" — Detection of ambiguous indentation * "pyclbr" — Python module browser support * Function Objects * Class Objects * "py_compile" — Compile Python source files * Command-Line Interface * "compileall" — Byte-compile Python libraries * Command-line use * Public functions * "dis" — Disassembler for Python bytecode * Command-line interface * Bytecode analysis * Analysis functions * Python Bytecode Instructions * Opcode collections * "pickletools" — Tools for pickle developers * Command line usage * Command line options * Programmatic Interface * MS Windows Specific Services * "msvcrt" — Useful routines from the MS VC++ runtime * File Operations * Console I/O * Other Functions * "winreg" — Windows registry access * Functions * Constants * HKEY_* Constants * Access Rights * 64-bit Specific * Value Types * Registry Handle Objects * "winsound" — Sound-playing interface for Windows * Unix Specific Services * "posix" — The most common POSIX system calls * Large File Support * Notable Module Contents * "pwd" — The password database * "grp" — The group database * "termios" — POSIX style tty control * Example * "tty" — Terminal control functions * "pty" — Pseudo-terminal utilities * Example * "fcntl" — The "fcntl" and "ioctl" system calls * "resource" — Resource usage information * Resource Limits * Resource Usage * "syslog" — Unix syslog library routines * Examples * Simple example * Modules command-line interface (CLI) * Superseded Modules * "getopt" — C-style parser for command line options * Removed Modules * Security Considerations * Extending and Embedding the Python Interpreter * Recommended third party tools * Creating extensions without third party tools * 1. Extending Python with C or C++ * 1.1. A Simple Example * 1.2. Intermezzo: Errors and Exceptions * 1.3. Back to the Example * 1.4. The Module’s Method Table and Initialization Function * 1.5. Compilation and Linkage * 1.6. Calling Python Functions from C * 1.7. Extracting Parameters in Extension Functions * 1.8. Keyword Parameters for Extension Functions * 1.9. Building Arbitrary Values * 1.10. Reference Counts * 1.10.1. Reference Counting in Python * 1.10.2. Ownership Rules * 1.10.3. Thin Ice * 1.10.4. NULL Pointers * 1.11. Writing Extensions in C++ * 1.12. Providing a C API for an Extension Module * 2. Defining Extension Types: Tutorial * 2.1. The Basics * 2.2. Adding data and methods to the Basic example * 2.3. Providing finer control over data attributes * 2.4. Supporting cyclic garbage collection * 2.5. Subclassing other types * 3. Defining Extension Types: Assorted Topics * 3.1. Finalization and De-allocation * 3.2. Object Presentation * 3.3. Attribute Management * 3.3.1. Generic Attribute Management * 3.3.2. Type-specific Attribute Management * 3.4. Object Comparison * 3.5. Abstract Protocol Support * 3.6. Weak Reference Support * 3.7. More Suggestions * 4. Building C and C++ Extensions * 4.1. Building C and C++ Extensions with setuptools * 5. Building C and C++ Extensions on Windows * 5.1. A Cookbook Approach * 5.2. Differences Between Unix and Windows * 5.3. Using DLLs in Practice * Embedding the CPython runtime in a larger application * 1. Embedding Python in Another Application * 1.1. Very High Level Embedding * 1.2. Beyond Very High Level Embedding: An overview * 1.3. Pure Embedding * 1.4. Extending Embedded Python * 1.5. Embedding Python in C++ * 1.6. Compiling and Linking under Unix-like systems * Python/C API Reference Manual * Introduction * Coding standards * Include Files * Useful macros * Objects, Types and Reference Counts * Reference Counts * Reference Count Details * Types * Exceptions * Embedding Python * Debugging Builds * Recommended third party tools * C API Stability * Unstable C API * Stable Application Binary Interface * Limited C API * Stable ABI * Limited API Scope and Performance * Limited API Caveats * Platform Considerations * Contents of Limited API * The Very High Level Layer * Reference Counting * Exception Handling * Printing and clearing * Raising exceptions * Issuing warnings * Querying the error indicator * Signal Handling * Exception Classes * Exception Objects * Unicode Exception Objects * Recursion Control * Standard Exceptions * Standard Warning Categories * Utilities * Operating System Utilities * System Functions * Process Control * Importing Modules * Data marshalling support * Parsing arguments and building values * Parsing arguments * Strings and buffers * Numbers * Other objects * API Functions * Building values * String conversion and formatting * PyHash API * Reflection * Codec registry and support functions * Codec lookup API * Registry API for Unicode encoding error handlers * PyTime C API * Types * Clock Functions * Raw Clock Functions * Conversion functions * Support for Perf Maps * Abstract Objects Layer * Object Protocol * Call Protocol * The *tp_call* Protocol * The Vectorcall Protocol * Recursion Control * Vectorcall Support API * Object Calling API * Call Support API * Number Protocol * Sequence Protocol * Mapping Protocol * Iterator Protocol * Buffer Protocol * Buffer structure * Buffer request types * request-independent fields * readonly, format * shape, strides, suboffsets * contiguity requests * compound requests * Complex arrays * NumPy-style: shape and strides * PIL-style: shape, strides and suboffsets * Buffer-related functions * Concrete Objects Layer * Fundamental Objects * Type Objects * Creating Heap-Allocated Types * The "None" Object * Numeric Objects * Integer Objects * Boolean Objects * Floating-Point Objects * Pack and Unpack functions * Pack functions * Unpack functions * Complex Number Objects * Complex Numbers as C Structures * Complex Numbers as Python Objects * Sequence Objects * Bytes Objects * Byte Array Objects * Type check macros * Direct API functions * Macros * Unicode Objects and Codecs * Unicode Objects * Unicode Type * Unicode Character Properties * Creating and accessing Unicode strings * Locale Encoding * File System Encoding * wchar_t Support * Built-in Codecs * Generic Codecs * UTF-8 Codecs * UTF-32 Codecs * UTF-16 Codecs * UTF-7 Codecs * Unicode-Escape Codecs * Raw-Unicode-Escape Codecs * Latin-1 Codecs * ASCII Codecs * Character Map Codecs * MBCS codecs for Windows * Methods and Slot Functions * Tuple Objects * Struct Sequence Objects * List Objects * Container Objects * Dictionary Objects * Set Objects * Function Objects * Function Objects * Instance Method Objects * Method Objects * Cell Objects * Code Objects * Extra information * Other Objects * File Objects * Module Objects * Initializing C modules * Single-phase initialization * Multi-phase initialization * Low-level module creation functions * Support functions * Module lookup * Iterator Objects * Descriptor Objects * Slice Objects * Ellipsis Object * MemoryView objects * Weak Reference Objects * Capsules * Frame Objects * Frame Locals Proxies * Internal Frames * Generator Objects * Coroutine Objects * Context Variables Objects * DateTime Objects * Objects for Type Hinting * Initialization, Finalization, and Threads * Before Python Initialization * Global configuration variables * Initializing and finalizing the interpreter * Process-wide parameters * Thread State and the Global Interpreter Lock * Releasing the GIL from extension code * Non-Python created threads * Cautions about fork() * High-level API * Low-level API * Sub-interpreter support * A Per-Interpreter GIL * Bugs and caveats * Asynchronous Notifications * Profiling and Tracing * Reference tracing * Advanced Debugger Support * Thread Local Storage Support * Thread Specific Storage (TSS) API * Dynamic Allocation * Methods * Thread Local Storage (TLS) API * Synchronization Primitives * Python Critical Section API * Python Initialization Configuration * Example * PyWideStringList * PyStatus * PyPreConfig * Preinitialize Python with PyPreConfig * PyConfig * Initialization with PyConfig * Isolated Configuration * Python Configuration * Python Path Configuration * Py_GetArgcArgv() * Multi-Phase Initialization Private Provisional API * Memory Management * Overview * Allocator Domains * Raw Memory Interface * Memory Interface * Object allocators * Default Memory Allocators * Customize Memory Allocators * Debug hooks on the Python memory allocators * The pymalloc allocator * Customize pymalloc Arena Allocator * The mimalloc allocator * tracemalloc C API * Examples * Object Implementation Support * Allocating Objects on the Heap * Common Object Structures * Base object types and macros * Implementing functions and methods * Accessing attributes of extension types * Member flags * Member types * Defining Getters and Setters * Type Object Structures * Quick Reference * “tp slots” * sub-slots * slot typedefs * PyTypeObject Definition * PyObject Slots * PyVarObject Slots * PyTypeObject Slots * Static Types * Heap Types * Number Object Structures * Mapping Object Structures * Sequence Object Structures * Buffer Object Structures * Async Object Structures * Slot Type typedefs * Examples * Supporting Cyclic Garbage Collection * Controlling the Garbage Collector State * Querying Garbage Collector State * API and ABI Versioning * Monitoring C API * Generating Execution Events * Managing the Monitoring State * Installing Python Modules * Key terms * Basic usage * How do I …? * … install "pip" in versions of Python prior to Python 3.4? * … install packages just for the current user? * … install scientific Python packages? * … work with multiple versions of Python installed in parallel? * Common installation issues * Installing into the system Python on Linux * Pip not installed * Installing binary extensions * Python HOWTOs * Python Frequently Asked Questions * General Python FAQ * General Information * Python in the real world * Programming FAQ * General Questions * Core Language * Numbers and strings * Performance * Sequences (Tuples/Lists) * Objects * Modules * Design and History FAQ * Why does Python use indentation for grouping of statements? * Why am I getting strange results with simple arithmetic operations? * Why are floating-point calculations so inaccurate? * Why are Python strings immutable? * Why must ‘self’ be used explicitly in method definitions and calls? * Why can’t I use an assignment in an expression? * Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))? * Why is join() a string method instead of a list or tuple method? * How fast are exceptions? * Why isn’t there a switch or case statement in Python? * Can’t you emulate threads in the interpreter instead of relying on an OS-specific thread implementation? * Why can’t lambda expressions contain statements? * Can Python be compiled to machine code, C or some other language? * How does Python manage memory? * Why doesn’t CPython use a more traditional garbage collection scheme? * Why isn’t all memory freed when CPython exits? * Why are there separate tuple and list data types? * How are lists implemented in CPython? * How are dictionaries implemented in CPython? * Why must dictionary keys be immutable? * Why doesn’t list.sort() return the sorted list? * How do you specify and enforce an interface spec in Python? * Why is there no goto? * Why can’t raw strings (r-strings) end with a backslash? * Why doesn’t Python have a “with” statement for attribute assignments? * Why don’t generators support the with statement? * Why are colons required for the if/while/def/class statements? * Why does Python allow commas at the end of lists and tuples? * Library and Extension FAQ * General Library Questions * Common tasks * Threads * Input and Output * Network/Internet Programming * Databases * Mathematics and Numerics * Extending/Embedding FAQ * Can I create my own functions in C? * Can I create my own functions in C++? * Writing C is hard; are there any alternatives? * How can I execute arbitrary Python statements from C? * How can I evaluate an arbitrary Python expression from C? * How do I extract C values from a Python object? * How do I use Py_BuildValue() to create a tuple of arbitrary length? * How do I call an object’s method from C? * How do I catch the output from PyErr_Print() (or anything that prints to stdout/stderr)? * How do I access a module written in Python from C? * How do I interface to C++ objects from Python? * I added a module using the Setup file and the make fails; why? * How do I debug an extension? * I want to compile a Python module on my Linux system, but some files are missing. Why? * How do I tell “incomplete input” from “invalid input”? * How do I find undefined g++ symbols __builtin_new or __pure_virtual? * Can I create an object class with some methods implemented in C and others in Python (e.g. through inheritance)? * Python on Windows FAQ * How do I run a Python program under Windows? * How do I make Python scripts executable? * Why does Python sometimes take so long to start? * How do I make an executable from a Python script? * Is a "*.pyd" file the same as a DLL? * How can I embed Python into a Windows application? * How do I keep editors from inserting tabs into my Python source? * How do I check for a keypress without blocking? * How do I solve the missing api-ms-win-crt-runtime-l1-1-0.dll error? * Graphic User Interface FAQ * General GUI Questions * What GUI toolkits exist for Python? * Tkinter questions * “Why is Python Installed on my Computer?” FAQ * What is Python? * Why is Python installed on my machine? * Can I delete Python? * Deprecations * Pending Removal in Python 3.14 * Pending Removal in Python 3.15 * Pending removal in Python 3.16 * Pending Removal in Future Versions * C API Deprecations * Pending Removal in Python 3.14 * Pending Removal in Python 3.15 * Pending Removal in Future Versions * Glossary * About this documentation * Contributors to the Python documentation * Dealing with Bugs * Documentation bugs * Using the Python issue tracker * Getting started contributing to Python yourself * Copyright * History and License * History of the software * Terms and conditions for accessing or otherwise using Python * PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2 * BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0 * CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1 * CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2 * ZERO-CLAUSE BSD LICENSE FOR CODE IN THE PYTHON DOCUMENTATION * Licenses and Acknowledgements for Incorporated Software * Mersenne Twister * Sockets * Asynchronous socket services * Cookie management * Execution tracing * UUencode and UUdecode functions * XML Remote Procedure Calls * test_epoll * Select kqueue * SipHash24 * strtod and dtoa * OpenSSL * expat * libffi * zlib * cfuhash * libmpdec * W3C C14N test suite * mimalloc * asyncio * Global Unbounded Sequences (GUS) Copyright ********* Python and this documentation is: Copyright © 2001-2024 Python Software Foundation. All rights reserved. Copyright © 2000 BeOpen.com. All rights reserved. Copyright © 1995-2000 Corporation for National Research Initiatives. All rights reserved. Copyright © 1991-1995 Stichting Mathematisch Centrum. All rights reserved. ====================================================================== See History and License for complete license and permissions information. Pending Removal in Python 3.14 ****************************** * The "ma_version_tag" field in "PyDictObject" for extension modules (**PEP 699**; gh-101193). * Creating "immutable types" with mutable bases (gh-95388). * Functions to configure Python’s initialization, deprecated in Python 3.11: * "PySys_SetArgvEx()": Set "PyConfig.argv" instead. * "PySys_SetArgv()": Set "PyConfig.argv" instead. * "Py_SetProgramName()": Set "PyConfig.program_name" instead. * "Py_SetPythonHome()": Set "PyConfig.home" instead. The "Py_InitializeFromConfig()" API should be used with "PyConfig" instead. * Global configuration variables: * "Py_DebugFlag": Use "PyConfig.parser_debug" instead. * "Py_VerboseFlag": Use "PyConfig.verbose" instead. * "Py_QuietFlag": Use "PyConfig.quiet" instead. * "Py_InteractiveFlag": Use "PyConfig.interactive" instead. * "Py_InspectFlag": Use "PyConfig.inspect" instead. * "Py_OptimizeFlag": Use "PyConfig.optimization_level" instead. * "Py_NoSiteFlag": Use "PyConfig.site_import" instead. * "Py_BytesWarningFlag": Use "PyConfig.bytes_warning" instead. * "Py_FrozenFlag": Use "PyConfig.pathconfig_warnings" instead. * "Py_IgnoreEnvironmentFlag": Use "PyConfig.use_environment" instead. * "Py_DontWriteBytecodeFlag": Use "PyConfig.write_bytecode" instead. * "Py_NoUserSiteDirectory": Use "PyConfig.user_site_directory" instead. * "Py_UnbufferedStdioFlag": Use "PyConfig.buffered_stdio" instead. * "Py_HashRandomizationFlag": Use "PyConfig.use_hash_seed" and "PyConfig.hash_seed" instead. * "Py_IsolatedFlag": Use "PyConfig.isolated" instead. * "Py_LegacyWindowsFSEncodingFlag": Use "PyPreConfig.legacy_windows_fs_encoding" instead. * "Py_LegacyWindowsStdioFlag": Use "PyConfig.legacy_windows_stdio" instead. * "Py_FileSystemDefaultEncoding": Use "PyConfig.filesystem_encoding" instead. * "Py_HasFileSystemDefaultEncoding": Use "PyConfig.filesystem_encoding" instead. * "Py_FileSystemDefaultEncodeErrors": Use "PyConfig.filesystem_errors" instead. * "Py_UTF8Mode": Use "PyPreConfig.utf8_mode" instead. (see "Py_PreInitialize()") The "Py_InitializeFromConfig()" API should be used with "PyConfig" instead. Pending Removal in Python 3.15 ****************************** * The bundled copy of "libmpdecimal". * The "PyImport_ImportModuleNoBlock()": Use "PyImport_ImportModule()" instead. * "PyWeakref_GetObject()" and "PyWeakref_GET_OBJECT()": Use "PyWeakref_GetRef()" instead. * "Py_UNICODE" type and the "Py_UNICODE_WIDE" macro: Use "wchar_t" instead. * Python initialization functions: * "PySys_ResetWarnOptions()": Clear "sys.warnoptions" and "warnings.filters" instead. * "Py_GetExecPrefix()": Get "sys.base_exec_prefix" and "sys.exec_prefix" instead. * "Py_GetPath()": Get "sys.path" instead. * "Py_GetPrefix()": Get "sys.base_prefix" and "sys.prefix" instead. * "Py_GetProgramFullPath()": Get "sys.executable" instead. * "Py_GetProgramName()": Get "sys.executable" instead. * "Py_GetPythonHome()": Get "PyConfig.home" or the "PYTHONHOME" environment variable instead. Pending Removal in Future Versions ********************************** The following APIs are deprecated and will be removed, although there is currently no date scheduled for their removal. * "Py_TPFLAGS_HAVE_FINALIZE": Unneeded since Python 3.8. * "PyErr_Fetch()": Use "PyErr_GetRaisedException()" instead. * "PyErr_NormalizeException()": Use "PyErr_GetRaisedException()" instead. * "PyErr_Restore()": Use "PyErr_SetRaisedException()" instead. * "PyModule_GetFilename()": Use "PyModule_GetFilenameObject()" instead. * "PyOS_AfterFork()": Use "PyOS_AfterFork_Child()" instead. * "PySlice_GetIndicesEx()": Use "PySlice_Unpack()" and "PySlice_AdjustIndices()" instead. * "PyUnicode_AsDecodedObject()": Use "PyCodec_Decode()" instead. * "PyUnicode_AsDecodedUnicode()": Use "PyCodec_Decode()" instead. * "PyUnicode_AsEncodedObject()": Use "PyCodec_Encode()" instead. * "PyUnicode_AsEncodedUnicode()": Use "PyCodec_Encode()" instead. * "PyUnicode_READY()": Unneeded since Python 3.12 * "PyErr_Display()": Use "PyErr_DisplayException()" instead. * "_PyErr_ChainExceptions()": Use "_PyErr_ChainExceptions1()" instead. * "PyBytesObject.ob_shash" member: call "PyObject_Hash()" instead. * "PyDictObject.ma_version_tag" member. * Thread Local Storage (TLS) API: * "PyThread_create_key()": Use "PyThread_tss_alloc()" instead. * "PyThread_delete_key()": Use "PyThread_tss_free()" instead. * "PyThread_set_key_value()": Use "PyThread_tss_set()" instead. * "PyThread_get_key_value()": Use "PyThread_tss_get()" instead. * "PyThread_delete_key_value()": Use "PyThread_tss_delete()" instead. * "PyThread_ReInitTLS()": Unneeded since Python 3.7. Deprecations ************ Pending Removal in Python 3.14 ============================== * "argparse": The *type*, *choices*, and *metavar* parameters of "argparse.BooleanOptionalAction" are deprecated and will be removed in 3.14. (Contributed by Nikita Sobolev in gh-92248.) * "ast": The following features have been deprecated in documentation since Python 3.8, now cause a "DeprecationWarning" to be emitted at runtime when they are accessed or used, and will be removed in Python 3.14: * "ast.Num" * "ast.Str" * "ast.Bytes" * "ast.NameConstant" * "ast.Ellipsis" Use "ast.Constant" instead. (Contributed by Serhiy Storchaka in gh-90953.) * "asyncio": * The child watcher classes "MultiLoopChildWatcher", "FastChildWatcher", "AbstractChildWatcher" and "SafeChildWatcher" are deprecated and will be removed in Python 3.14. (Contributed by Kumar Aditya in gh-94597.) * "asyncio.set_child_watcher()", "asyncio.get_child_watcher()", "asyncio.AbstractEventLoopPolicy.set_child_watcher()" and "asyncio.AbstractEventLoopPolicy.get_child_watcher()" are deprecated and will be removed in Python 3.14. (Contributed by Kumar Aditya in gh-94597.) * The "get_event_loop()" method of the default event loop policy now emits a "DeprecationWarning" if there is no current event loop set and it decides to create one. (Contributed by Serhiy Storchaka and Guido van Rossum in gh-100160.) * "collections.abc": Deprecated "ByteString". Prefer "Sequence" or "Buffer". For use in typing, prefer a union, like "bytes | bytearray", or "collections.abc.Buffer". (Contributed by Shantanu Jain in gh-91896.) * "email": Deprecated the *isdst* parameter in "email.utils.localtime()". (Contributed by Alan Williams in gh-72346.) * "importlib.abc" deprecated classes: * "importlib.abc.ResourceReader" * "importlib.abc.Traversable" * "importlib.abc.TraversableResources" Use "importlib.resources.abc" classes instead: * "importlib.resources.abc.Traversable" * "importlib.resources.abc.TraversableResources" (Contributed by Jason R. Coombs and Hugo van Kemenade in gh-93963.) * "itertools" had undocumented, inefficient, historically buggy, and inconsistent support for copy, deepcopy, and pickle operations. This will be removed in 3.14 for a significant reduction in code volume and maintenance burden. (Contributed by Raymond Hettinger in gh-101588.) * "multiprocessing": The default start method will change to a safer one on Linux, BSDs, and other non-macOS POSIX platforms where "'fork'" is currently the default (gh-84559). Adding a runtime warning about this was deemed too disruptive as the majority of code is not expected to care. Use the "get_context()" or "set_start_method()" APIs to explicitly specify when your code *requires* "'fork'". See Contexts and start methods. * "pathlib": "is_relative_to()" and "relative_to()": passing additional arguments is deprecated. * "pkgutil": "find_loader()" and "get_loader()" now raise "DeprecationWarning"; use "importlib.util.find_spec()" instead. (Contributed by Nikita Sobolev in gh-97850.) * "pty": * "master_open()": use "pty.openpty()". * "slave_open()": use "pty.openpty()". * "sqlite3": * "version" and "version_info". * "execute()" and "executemany()" if named placeholders are used and *parameters* is a sequence instead of a "dict". * "typing": "ByteString", deprecated since Python 3.9, now causes a "DeprecationWarning" to be emitted when it is used. * "urllib": "urllib.parse.Quoter" is deprecated: it was not intended to be a public API. (Contributed by Gregory P. Smith in gh-88168.) Pending Removal in Python 3.15 ============================== * The import system: * Setting "__cached__" on a module while failing to set "__spec__.cached" is deprecated. In Python 3.15, "__cached__" will cease to be set or take into consideration by the import system or standard library. (gh-97879) * Setting "__package__" on a module while failing to set "__spec__.parent" is deprecated. In Python 3.15, "__package__" will cease to be set or take into consideration by the import system or standard library. (gh-97879) * "ctypes": * The undocumented "ctypes.SetPointerType()" function has been deprecated since Python 3.13. * "http.server": * The obsolete and rarely used "CGIHTTPRequestHandler" has been deprecated since Python 3.13. No direct replacement exists. *Anything* is better than CGI to interface a web server with a request handler. * The "--cgi" flag to the **python -m http.server** command-line interface has been deprecated since Python 3.13. * "importlib": * "load_module()" method: use "exec_module()" instead. * "locale": * The "getdefaultlocale()" function has been deprecated since Python 3.11. Its removal was originally planned for Python 3.13 (gh-90817), but has been postponed to Python 3.15. Use "getlocale()", "setlocale()", and "getencoding()" instead. (Contributed by Hugo van Kemenade in gh-111187.) * "pathlib": * "PurePath.is_reserved()" has been deprecated since Python 3.13. Use "os.path.isreserved()" to detect reserved paths on Windows. * "platform": * "java_ver()" has been deprecated since Python 3.13. This function is only useful for Jython support, has a confusing API, and is largely untested. * "sysconfig": * The *check_home* argument of "sysconfig.is_python_build()" has been deprecated since Python 3.12. * "threading": * "RLock()" will take no arguments in Python 3.15. Passing any arguments has been deprecated since Python 3.14, as the Python version does not permit any arguments, but the C version allows any number of positional or keyword arguments, ignoring every argument. * "types": * "types.CodeType": Accessing "co_lnotab" was deprecated in **PEP 626** since 3.10 and was planned to be removed in 3.12, but it only got a proper "DeprecationWarning" in 3.12. May be removed in 3.15. (Contributed by Nikita Sobolev in gh-101866.) * "typing": * The undocumented keyword argument syntax for creating "NamedTuple" classes (e.g. "Point = NamedTuple("Point", x=int, y=int)") has been deprecated since Python 3.13. Use the class-based syntax or the functional syntax instead. * When using the functional syntax of "TypedDict"s, failing to pass a value to the *fields* parameter ("TD = TypedDict("TD")") or passing "None" ("TD = TypedDict("TD", None)") has been deprecated since Python 3.13. Use "class TD(TypedDict): pass" or "TD = TypedDict("TD", {})" to create a TypedDict with zero field. * The "typing.no_type_check_decorator()" decorator function has been deprecated since Python 3.13. After eight years in the "typing" module, it has yet to be supported by any major type checker. * "wave": * The "getmark()", "setmark()", and "getmarkers()" methods of the "Wave_read" and "Wave_write" classes have been deprecated since Python 3.13. Pending removal in Python 3.16 ============================== * The import system: * Setting "__loader__" on a module while failing to set "__spec__.loader" is deprecated. In Python 3.16, "__loader__" will cease to be set or taken into consideration by the import system or the standard library. * "array": * The "'u'" format code ("wchar_t") has been deprecated in documentation since Python 3.3 and at runtime since Python 3.13. Use the "'w'" format code ("Py_UCS4") for Unicode characters instead. * "asyncio": * "asyncio.iscoroutinefunction()" is deprecated and will be removed in Python 3.16, use "inspect.iscoroutinefunction()" instead. (Contributed by Jiahao Li and Kumar Aditya in gh-122875.) * "builtins": * Bitwise inversion on boolean types, "~True" or "~False" has been deprecated since Python 3.12, as it produces surprising and unintuitive results ("-2" and "-1"). Use "not x" instead for the logical negation of a Boolean. In the rare case that you need the bitwise inversion of the underlying integer, convert to "int" explicitly ("~int(x)"). * "shutil": * The "ExecError" exception has been deprecated since Python 3.14. It has not been used by any function in "shutil" since Python 3.4, and is now an alias of "RuntimeError". * "symtable": * The "Class.get_methods" method has been deprecated since Python 3.14. * "sys": * The "_enablelegacywindowsfsencoding()" function has been deprecated since Python 3.13. Use the "PYTHONLEGACYWINDOWSFSENCODING" environment variable instead. * "tarfile": * The undocumented and unused "TarFile.tarfile" attribute has been deprecated since Python 3.13. Pending Removal in Future Versions ================================== The following APIs will be removed in the future, although there is currently no date scheduled for their removal. * "argparse": Nesting argument groups and nesting mutually exclusive groups are deprecated. * "builtins": * "bool(NotImplemented)". * Generators: "throw(type, exc, tb)" and "athrow(type, exc, tb)" signature is deprecated: use "throw(exc)" and "athrow(exc)" instead, the single argument signature. * Currently Python accepts numeric literals immediately followed by keywords, for example "0in x", "1or x", "0if 1else 2". It allows confusing and ambiguous expressions like "[0x1for x in y]" (which can be interpreted as "[0x1 for x in y]" or "[0x1f or x in y]"). A syntax warning is raised if the numeric literal is immediately followed by one of keywords "and", "else", "for", "if", "in", "is" and "or". In a future release it will be changed to a syntax error. (gh-87999) * Support for "__index__()" and "__int__()" method returning non-int type: these methods will be required to return an instance of a strict subclass of "int". * Support for "__float__()" method returning a strict subclass of "float": these methods will be required to return an instance of "float". * Support for "__complex__()" method returning a strict subclass of "complex": these methods will be required to return an instance of "complex". * Delegation of "int()" to "__trunc__()" method. * Passing a complex number as the *real* or *imag* argument in the "complex()" constructor is now deprecated; it should only be passed as a single positional argument. (Contributed by Serhiy Storchaka in gh-109218.) * "calendar": "calendar.January" and "calendar.February" constants are deprecated and replaced by "calendar.JANUARY" and "calendar.FEBRUARY". (Contributed by Prince Roshan in gh-103636.) * "codeobject.co_lnotab": use the "codeobject.co_lines()" method instead. * "datetime": * "utcnow()": use "datetime.datetime.now(tz=datetime.UTC)". * "utcfromtimestamp()": use "datetime.datetime.fromtimestamp(timestamp, tz=datetime.UTC)". * "gettext": Plural value must be an integer. * "importlib": * "cache_from_source()" *debug_override* parameter is deprecated: use the *optimization* parameter instead. * "importlib.metadata": * "EntryPoints" tuple interface. * Implicit "None" on return values. * "logging": the "warn()" method has been deprecated since Python 3.3, use "warning()" instead. * "mailbox": Use of StringIO input and text mode is deprecated, use BytesIO and binary mode instead. * "os": Calling "os.register_at_fork()" in multi-threaded process. * "pydoc.ErrorDuringImport": A tuple value for *exc_info* parameter is deprecated, use an exception instance. * "re": More strict rules are now applied for numerical group references and group names in regular expressions. Only sequence of ASCII digits is now accepted as a numerical reference. The group name in bytes patterns and replacement strings can now only contain ASCII letters and digits and underscore. (Contributed by Serhiy Storchaka in gh-91760.) * "sre_compile", "sre_constants" and "sre_parse" modules. * "shutil": "rmtree()"’s *onerror* parameter is deprecated in Python 3.12; use the *onexc* parameter instead. * "ssl" options and protocols: * "ssl.SSLContext" without protocol argument is deprecated. * "ssl.SSLContext": "set_npn_protocols()" and "selected_npn_protocol()" are deprecated: use ALPN instead. * "ssl.OP_NO_SSL*" options * "ssl.OP_NO_TLS*" options * "ssl.PROTOCOL_SSLv3" * "ssl.PROTOCOL_TLS" * "ssl.PROTOCOL_TLSv1" * "ssl.PROTOCOL_TLSv1_1" * "ssl.PROTOCOL_TLSv1_2" * "ssl.TLSVersion.SSLv3" * "ssl.TLSVersion.TLSv1" * "ssl.TLSVersion.TLSv1_1" * "threading" methods: * "threading.Condition.notifyAll()": use "notify_all()". * "threading.Event.isSet()": use "is_set()". * "threading.Thread.isDaemon()", "threading.Thread.setDaemon()": use "threading.Thread.daemon" attribute. * "threading.Thread.getName()", "threading.Thread.setName()": use "threading.Thread.name" attribute. * "threading.currentThread()": use "threading.current_thread()". * "threading.activeCount()": use "threading.active_count()". * "typing.Text" (gh-92332). * "unittest.IsolatedAsyncioTestCase": it is deprecated to return a value that is not "None" from a test case. * "urllib.parse" deprecated functions: "urlparse()" instead * "splitattr()" * "splithost()" * "splitnport()" * "splitpasswd()" * "splitport()" * "splitquery()" * "splittag()" * "splittype()" * "splituser()" * "splitvalue()" * "to_bytes()" * "urllib.request": "URLopener" and "FancyURLopener" style of invoking requests is deprecated. Use newer "urlopen()" functions and methods. * "wsgiref": "SimpleHandler.stdout.write()" should not do partial writes. * "xml.etree.ElementTree": Testing the truth value of an "Element" is deprecated. In a future release it will always return "True". Prefer explicit "len(elem)" or "elem is not None" tests instead. * "zipimport.zipimporter.load_module()" is deprecated: use "exec_module()" instead. C API Deprecations ================== Pending Removal in Python 3.14 ------------------------------ * The "ma_version_tag" field in "PyDictObject" for extension modules (**PEP 699**; gh-101193). * Creating "immutable types" with mutable bases (gh-95388). * Functions to configure Python’s initialization, deprecated in Python 3.11: * "PySys_SetArgvEx()": Set "PyConfig.argv" instead. * "PySys_SetArgv()": Set "PyConfig.argv" instead. * "Py_SetProgramName()": Set "PyConfig.program_name" instead. * "Py_SetPythonHome()": Set "PyConfig.home" instead. The "Py_InitializeFromConfig()" API should be used with "PyConfig" instead. * Global configuration variables: * "Py_DebugFlag": Use "PyConfig.parser_debug" instead. * "Py_VerboseFlag": Use "PyConfig.verbose" instead. * "Py_QuietFlag": Use "PyConfig.quiet" instead. * "Py_InteractiveFlag": Use "PyConfig.interactive" instead. * "Py_InspectFlag": Use "PyConfig.inspect" instead. * "Py_OptimizeFlag": Use "PyConfig.optimization_level" instead. * "Py_NoSiteFlag": Use "PyConfig.site_import" instead. * "Py_BytesWarningFlag": Use "PyConfig.bytes_warning" instead. * "Py_FrozenFlag": Use "PyConfig.pathconfig_warnings" instead. * "Py_IgnoreEnvironmentFlag": Use "PyConfig.use_environment" instead. * "Py_DontWriteBytecodeFlag": Use "PyConfig.write_bytecode" instead. * "Py_NoUserSiteDirectory": Use "PyConfig.user_site_directory" instead. * "Py_UnbufferedStdioFlag": Use "PyConfig.buffered_stdio" instead. * "Py_HashRandomizationFlag": Use "PyConfig.use_hash_seed" and "PyConfig.hash_seed" instead. * "Py_IsolatedFlag": Use "PyConfig.isolated" instead. * "Py_LegacyWindowsFSEncodingFlag": Use "PyPreConfig.legacy_windows_fs_encoding" instead. * "Py_LegacyWindowsStdioFlag": Use "PyConfig.legacy_windows_stdio" instead. * "Py_FileSystemDefaultEncoding": Use "PyConfig.filesystem_encoding" instead. * "Py_HasFileSystemDefaultEncoding": Use "PyConfig.filesystem_encoding" instead. * "Py_FileSystemDefaultEncodeErrors": Use "PyConfig.filesystem_errors" instead. * "Py_UTF8Mode": Use "PyPreConfig.utf8_mode" instead. (see "Py_PreInitialize()") The "Py_InitializeFromConfig()" API should be used with "PyConfig" instead. Pending Removal in Python 3.15 ------------------------------ * The bundled copy of "libmpdecimal". * The "PyImport_ImportModuleNoBlock()": Use "PyImport_ImportModule()" instead. * "PyWeakref_GetObject()" and "PyWeakref_GET_OBJECT()": Use "PyWeakref_GetRef()" instead. * "Py_UNICODE" type and the "Py_UNICODE_WIDE" macro: Use "wchar_t" instead. * Python initialization functions: * "PySys_ResetWarnOptions()": Clear "sys.warnoptions" and "warnings.filters" instead. * "Py_GetExecPrefix()": Get "sys.base_exec_prefix" and "sys.exec_prefix" instead. * "Py_GetPath()": Get "sys.path" instead. * "Py_GetPrefix()": Get "sys.base_prefix" and "sys.prefix" instead. * "Py_GetProgramFullPath()": Get "sys.executable" instead. * "Py_GetProgramName()": Get "sys.executable" instead. * "Py_GetPythonHome()": Get "PyConfig.home" or the "PYTHONHOME" environment variable instead. Pending Removal in Future Versions ---------------------------------- The following APIs are deprecated and will be removed, although there is currently no date scheduled for their removal. * "Py_TPFLAGS_HAVE_FINALIZE": Unneeded since Python 3.8. * "PyErr_Fetch()": Use "PyErr_GetRaisedException()" instead. * "PyErr_NormalizeException()": Use "PyErr_GetRaisedException()" instead. * "PyErr_Restore()": Use "PyErr_SetRaisedException()" instead. * "PyModule_GetFilename()": Use "PyModule_GetFilenameObject()" instead. * "PyOS_AfterFork()": Use "PyOS_AfterFork_Child()" instead. * "PySlice_GetIndicesEx()": Use "PySlice_Unpack()" and "PySlice_AdjustIndices()" instead. * "PyUnicode_AsDecodedObject()": Use "PyCodec_Decode()" instead. * "PyUnicode_AsDecodedUnicode()": Use "PyCodec_Decode()" instead. * "PyUnicode_AsEncodedObject()": Use "PyCodec_Encode()" instead. * "PyUnicode_AsEncodedUnicode()": Use "PyCodec_Encode()" instead. * "PyUnicode_READY()": Unneeded since Python 3.12 * "PyErr_Display()": Use "PyErr_DisplayException()" instead. * "_PyErr_ChainExceptions()": Use "_PyErr_ChainExceptions1()" instead. * "PyBytesObject.ob_shash" member: call "PyObject_Hash()" instead. * "PyDictObject.ma_version_tag" member. * Thread Local Storage (TLS) API: * "PyThread_create_key()": Use "PyThread_tss_alloc()" instead. * "PyThread_delete_key()": Use "PyThread_tss_free()" instead. * "PyThread_set_key_value()": Use "PyThread_tss_set()" instead. * "PyThread_get_key_value()": Use "PyThread_tss_get()" instead. * "PyThread_delete_key_value()": Use "PyThread_tss_delete()" instead. * "PyThread_ReInitTLS()": Unneeded since Python 3.7. Pending Removal in Python 3.13 ****************************** Modules (see **PEP 594**): * "aifc" * "audioop" * "cgi" * "cgitb" * "chunk" * "crypt" * "imghdr" * "mailcap" * "msilib" * "nis" * "nntplib" * "ossaudiodev" * "pipes" * "sndhdr" * "spwd" * "sunau" * "telnetlib" * "uu" * "xdrlib" Other modules: * "lib2to3", and the **2to3** program (gh-84540) APIs: * "configparser.LegacyInterpolation" (gh-90765) * "locale.resetlocale()" (gh-90817) * "turtle.RawTurtle.settiltangle()" (gh-50096) * "unittest.findTestCases()" (gh-50096) * "unittest.getTestCaseNames()" (gh-50096) * "unittest.makeSuite()" (gh-50096) * "unittest.TestProgram.usageExit()" (gh-67048) * "webbrowser.MacOSX" (gh-86421) * "classmethod" descriptor chaining (gh-89519) * "importlib.resources" deprecated methods: * "contents()" * "is_resource()" * "open_binary()" * "open_text()" * "path()" * "read_binary()" * "read_text()" Use "importlib.resources.files()" instead. Refer to importlib- resources: Migrating from Legacy (gh-106531) Pending Removal in Python 3.14 ****************************** * "argparse": The *type*, *choices*, and *metavar* parameters of "argparse.BooleanOptionalAction" are deprecated and will be removed in 3.14. (Contributed by Nikita Sobolev in gh-92248.) * "ast": The following features have been deprecated in documentation since Python 3.8, now cause a "DeprecationWarning" to be emitted at runtime when they are accessed or used, and will be removed in Python 3.14: * "ast.Num" * "ast.Str" * "ast.Bytes" * "ast.NameConstant" * "ast.Ellipsis" Use "ast.Constant" instead. (Contributed by Serhiy Storchaka in gh-90953.) * "asyncio": * The child watcher classes "MultiLoopChildWatcher", "FastChildWatcher", "AbstractChildWatcher" and "SafeChildWatcher" are deprecated and will be removed in Python 3.14. (Contributed by Kumar Aditya in gh-94597.) * "asyncio.set_child_watcher()", "asyncio.get_child_watcher()", "asyncio.AbstractEventLoopPolicy.set_child_watcher()" and "asyncio.AbstractEventLoopPolicy.get_child_watcher()" are deprecated and will be removed in Python 3.14. (Contributed by Kumar Aditya in gh-94597.) * The "get_event_loop()" method of the default event loop policy now emits a "DeprecationWarning" if there is no current event loop set and it decides to create one. (Contributed by Serhiy Storchaka and Guido van Rossum in gh-100160.) * "collections.abc": Deprecated "ByteString". Prefer "Sequence" or "Buffer". For use in typing, prefer a union, like "bytes | bytearray", or "collections.abc.Buffer". (Contributed by Shantanu Jain in gh-91896.) * "email": Deprecated the *isdst* parameter in "email.utils.localtime()". (Contributed by Alan Williams in gh-72346.) * "importlib.abc" deprecated classes: * "importlib.abc.ResourceReader" * "importlib.abc.Traversable" * "importlib.abc.TraversableResources" Use "importlib.resources.abc" classes instead: * "importlib.resources.abc.Traversable" * "importlib.resources.abc.TraversableResources" (Contributed by Jason R. Coombs and Hugo van Kemenade in gh-93963.) * "itertools" had undocumented, inefficient, historically buggy, and inconsistent support for copy, deepcopy, and pickle operations. This will be removed in 3.14 for a significant reduction in code volume and maintenance burden. (Contributed by Raymond Hettinger in gh-101588.) * "multiprocessing": The default start method will change to a safer one on Linux, BSDs, and other non-macOS POSIX platforms where "'fork'" is currently the default (gh-84559). Adding a runtime warning about this was deemed too disruptive as the majority of code is not expected to care. Use the "get_context()" or "set_start_method()" APIs to explicitly specify when your code *requires* "'fork'". See Contexts and start methods. * "pathlib": "is_relative_to()" and "relative_to()": passing additional arguments is deprecated. * "pkgutil": "find_loader()" and "get_loader()" now raise "DeprecationWarning"; use "importlib.util.find_spec()" instead. (Contributed by Nikita Sobolev in gh-97850.) * "pty": * "master_open()": use "pty.openpty()". * "slave_open()": use "pty.openpty()". * "sqlite3": * "version" and "version_info". * "execute()" and "executemany()" if named placeholders are used and *parameters* is a sequence instead of a "dict". * "typing": "ByteString", deprecated since Python 3.9, now causes a "DeprecationWarning" to be emitted when it is used. * "urllib": "urllib.parse.Quoter" is deprecated: it was not intended to be a public API. (Contributed by Gregory P. Smith in gh-88168.) Pending Removal in Python 3.15 ****************************** * The import system: * Setting "__cached__" on a module while failing to set "__spec__.cached" is deprecated. In Python 3.15, "__cached__" will cease to be set or take into consideration by the import system or standard library. (gh-97879) * Setting "__package__" on a module while failing to set "__spec__.parent" is deprecated. In Python 3.15, "__package__" will cease to be set or take into consideration by the import system or standard library. (gh-97879) * "ctypes": * The undocumented "ctypes.SetPointerType()" function has been deprecated since Python 3.13. * "http.server": * The obsolete and rarely used "CGIHTTPRequestHandler" has been deprecated since Python 3.13. No direct replacement exists. *Anything* is better than CGI to interface a web server with a request handler. * The "--cgi" flag to the **python -m http.server** command-line interface has been deprecated since Python 3.13. * "importlib": * "load_module()" method: use "exec_module()" instead. * "locale": * The "getdefaultlocale()" function has been deprecated since Python 3.11. Its removal was originally planned for Python 3.13 (gh-90817), but has been postponed to Python 3.15. Use "getlocale()", "setlocale()", and "getencoding()" instead. (Contributed by Hugo van Kemenade in gh-111187.) * "pathlib": * "PurePath.is_reserved()" has been deprecated since Python 3.13. Use "os.path.isreserved()" to detect reserved paths on Windows. * "platform": * "java_ver()" has been deprecated since Python 3.13. This function is only useful for Jython support, has a confusing API, and is largely untested. * "sysconfig": * The *check_home* argument of "sysconfig.is_python_build()" has been deprecated since Python 3.12. * "threading": * "RLock()" will take no arguments in Python 3.15. Passing any arguments has been deprecated since Python 3.14, as the Python version does not permit any arguments, but the C version allows any number of positional or keyword arguments, ignoring every argument. * "types": * "types.CodeType": Accessing "co_lnotab" was deprecated in **PEP 626** since 3.10 and was planned to be removed in 3.12, but it only got a proper "DeprecationWarning" in 3.12. May be removed in 3.15. (Contributed by Nikita Sobolev in gh-101866.) * "typing": * The undocumented keyword argument syntax for creating "NamedTuple" classes (e.g. "Point = NamedTuple("Point", x=int, y=int)") has been deprecated since Python 3.13. Use the class-based syntax or the functional syntax instead. * When using the functional syntax of "TypedDict"s, failing to pass a value to the *fields* parameter ("TD = TypedDict("TD")") or passing "None" ("TD = TypedDict("TD", None)") has been deprecated since Python 3.13. Use "class TD(TypedDict): pass" or "TD = TypedDict("TD", {})" to create a TypedDict with zero field. * The "typing.no_type_check_decorator()" decorator function has been deprecated since Python 3.13. After eight years in the "typing" module, it has yet to be supported by any major type checker. * "wave": * The "getmark()", "setmark()", and "getmarkers()" methods of the "Wave_read" and "Wave_write" classes have been deprecated since Python 3.13. Pending removal in Python 3.16 ****************************** * The import system: * Setting "__loader__" on a module while failing to set "__spec__.loader" is deprecated. In Python 3.16, "__loader__" will cease to be set or taken into consideration by the import system or the standard library. * "array": * The "'u'" format code ("wchar_t") has been deprecated in documentation since Python 3.3 and at runtime since Python 3.13. Use the "'w'" format code ("Py_UCS4") for Unicode characters instead. * "asyncio": * "asyncio.iscoroutinefunction()" is deprecated and will be removed in Python 3.16, use "inspect.iscoroutinefunction()" instead. (Contributed by Jiahao Li and Kumar Aditya in gh-122875.) * "builtins": * Bitwise inversion on boolean types, "~True" or "~False" has been deprecated since Python 3.12, as it produces surprising and unintuitive results ("-2" and "-1"). Use "not x" instead for the logical negation of a Boolean. In the rare case that you need the bitwise inversion of the underlying integer, convert to "int" explicitly ("~int(x)"). * "shutil": * The "ExecError" exception has been deprecated since Python 3.14. It has not been used by any function in "shutil" since Python 3.4, and is now an alias of "RuntimeError". * "symtable": * The "Class.get_methods" method has been deprecated since Python 3.14. * "sys": * The "_enablelegacywindowsfsencoding()" function has been deprecated since Python 3.13. Use the "PYTHONLEGACYWINDOWSFSENCODING" environment variable instead. * "tarfile": * The undocumented and unused "TarFile.tarfile" attribute has been deprecated since Python 3.13. Pending Removal in Future Versions ********************************** The following APIs will be removed in the future, although there is currently no date scheduled for their removal. * "argparse": Nesting argument groups and nesting mutually exclusive groups are deprecated. * "builtins": * "bool(NotImplemented)". * Generators: "throw(type, exc, tb)" and "athrow(type, exc, tb)" signature is deprecated: use "throw(exc)" and "athrow(exc)" instead, the single argument signature. * Currently Python accepts numeric literals immediately followed by keywords, for example "0in x", "1or x", "0if 1else 2". It allows confusing and ambiguous expressions like "[0x1for x in y]" (which can be interpreted as "[0x1 for x in y]" or "[0x1f or x in y]"). A syntax warning is raised if the numeric literal is immediately followed by one of keywords "and", "else", "for", "if", "in", "is" and "or". In a future release it will be changed to a syntax error. (gh-87999) * Support for "__index__()" and "__int__()" method returning non-int type: these methods will be required to return an instance of a strict subclass of "int". * Support for "__float__()" method returning a strict subclass of "float": these methods will be required to return an instance of "float". * Support for "__complex__()" method returning a strict subclass of "complex": these methods will be required to return an instance of "complex". * Delegation of "int()" to "__trunc__()" method. * Passing a complex number as the *real* or *imag* argument in the "complex()" constructor is now deprecated; it should only be passed as a single positional argument. (Contributed by Serhiy Storchaka in gh-109218.) * "calendar": "calendar.January" and "calendar.February" constants are deprecated and replaced by "calendar.JANUARY" and "calendar.FEBRUARY". (Contributed by Prince Roshan in gh-103636.) * "codeobject.co_lnotab": use the "codeobject.co_lines()" method instead. * "datetime": * "utcnow()": use "datetime.datetime.now(tz=datetime.UTC)". * "utcfromtimestamp()": use "datetime.datetime.fromtimestamp(timestamp, tz=datetime.UTC)". * "gettext": Plural value must be an integer. * "importlib": * "cache_from_source()" *debug_override* parameter is deprecated: use the *optimization* parameter instead. * "importlib.metadata": * "EntryPoints" tuple interface. * Implicit "None" on return values. * "logging": the "warn()" method has been deprecated since Python 3.3, use "warning()" instead. * "mailbox": Use of StringIO input and text mode is deprecated, use BytesIO and binary mode instead. * "os": Calling "os.register_at_fork()" in multi-threaded process. * "pydoc.ErrorDuringImport": A tuple value for *exc_info* parameter is deprecated, use an exception instance. * "re": More strict rules are now applied for numerical group references and group names in regular expressions. Only sequence of ASCII digits is now accepted as a numerical reference. The group name in bytes patterns and replacement strings can now only contain ASCII letters and digits and underscore. (Contributed by Serhiy Storchaka in gh-91760.) * "sre_compile", "sre_constants" and "sre_parse" modules. * "shutil": "rmtree()"’s *onerror* parameter is deprecated in Python 3.12; use the *onexc* parameter instead. * "ssl" options and protocols: * "ssl.SSLContext" without protocol argument is deprecated. * "ssl.SSLContext": "set_npn_protocols()" and "selected_npn_protocol()" are deprecated: use ALPN instead. * "ssl.OP_NO_SSL*" options * "ssl.OP_NO_TLS*" options * "ssl.PROTOCOL_SSLv3" * "ssl.PROTOCOL_TLS" * "ssl.PROTOCOL_TLSv1" * "ssl.PROTOCOL_TLSv1_1" * "ssl.PROTOCOL_TLSv1_2" * "ssl.TLSVersion.SSLv3" * "ssl.TLSVersion.TLSv1" * "ssl.TLSVersion.TLSv1_1" * "threading" methods: * "threading.Condition.notifyAll()": use "notify_all()". * "threading.Event.isSet()": use "is_set()". * "threading.Thread.isDaemon()", "threading.Thread.setDaemon()": use "threading.Thread.daemon" attribute. * "threading.Thread.getName()", "threading.Thread.setName()": use "threading.Thread.name" attribute. * "threading.currentThread()": use "threading.current_thread()". * "threading.activeCount()": use "threading.active_count()". * "typing.Text" (gh-92332). * "unittest.IsolatedAsyncioTestCase": it is deprecated to return a value that is not "None" from a test case. * "urllib.parse" deprecated functions: "urlparse()" instead * "splitattr()" * "splithost()" * "splitnport()" * "splitpasswd()" * "splitport()" * "splitquery()" * "splittag()" * "splittype()" * "splituser()" * "splitvalue()" * "to_bytes()" * "urllib.request": "URLopener" and "FancyURLopener" style of invoking requests is deprecated. Use newer "urlopen()" functions and methods. * "wsgiref": "SimpleHandler.stdout.write()" should not do partial writes. * "xml.etree.ElementTree": Testing the truth value of an "Element" is deprecated. In a future release it will always return "True". Prefer explicit "len(elem)" or "elem is not None" tests instead. * "zipimport.zipimporter.load_module()" is deprecated: use "exec_module()" instead. Distributing Python Modules *************************** Note: Information and guidance on distributing Python modules and packages has been moved to the Python Packaging User Guide, and the tutorial on packaging Python projects. 4. Building C and C++ Extensions ******************************** A C extension for CPython is a shared library (e.g. a ".so" file on Linux, ".pyd" on Windows), which exports an *initialization function*. To be importable, the shared library must be available on "PYTHONPATH", and must be named after the module name, with an appropriate extension. When using setuptools, the correct filename is generated automatically. The initialization function has the signature: PyObject *PyInit_modulename(void) It returns either a fully initialized module, or a "PyModuleDef" instance. See Initializing C modules for details. For modules with ASCII-only names, the function must be named "PyInit_**", with "" replaced by the name of the module. When using Multi-phase initialization, non-ASCII module names are allowed. In this case, the initialization function name is "PyInitU_**", with "" encoded using Python’s *punycode* encoding with hyphens replaced by underscores. In Python: def initfunc_name(name): try: suffix = b'_' + name.encode('ascii') except UnicodeEncodeError: suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') return b'PyInit' + suffix It is possible to export multiple modules from a single shared library by defining multiple initialization functions. However, importing them requires using symbolic links or a custom importer, because by default only the function corresponding to the filename is found. See the *“Multiple modules in one library”* section in **PEP 489** for details. 4.1. Building C and C++ Extensions with setuptools ================================================== Python 3.12 and newer no longer come with distutils. Please refer to the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html to learn more about how build and distribute C/C++ extensions with setuptools. 1. Embedding Python in Another Application ****************************************** The previous chapters discussed how to extend Python, that is, how to extend the functionality of Python by attaching a library of C functions to it. It is also possible to do it the other way around: enrich your C/C++ application by embedding Python in it. Embedding provides your application with the ability to implement some of the functionality of your application in Python rather than C or C++. This can be used for many purposes; one example would be to allow users to tailor the application to their needs by writing some scripts in Python. You can also use it yourself if some of the functionality can be written in Python more easily. Embedding Python is similar to extending it, but not quite. The difference is that when you extend Python, the main program of the application is still the Python interpreter, while if you embed Python, the main program may have nothing to do with Python — instead, some parts of the application occasionally call the Python interpreter to run some Python code. So if you are embedding Python, you are providing your own main program. One of the things this main program has to do is initialize the Python interpreter. At the very least, you have to call the function "Py_Initialize()". There are optional calls to pass command line arguments to Python. Then later you can call the interpreter from any part of the application. There are several different ways to call the interpreter: you can pass a string containing Python statements to "PyRun_SimpleString()", or you can pass a stdio file pointer and a file name (for identification in error messages only) to "PyRun_SimpleFile()". You can also call the lower-level operations described in the previous chapters to construct and use Python objects. See also: Python/C API Reference Manual The details of Python’s C interface are given in this manual. A great deal of necessary information can be found here. 1.1. Very High Level Embedding ============================== The simplest form of embedding Python is the use of the very high level interface. This interface is intended to execute a Python script without needing to interact with the application directly. This can for example be used to perform some operation on a file. #define PY_SSIZE_T_CLEAN #include int main(int argc, char *argv[]) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); /* optional but recommended */ status = PyConfig_SetBytesString(&config, &config.program_name, argv[0]); if (PyStatus_Exception(status)) { goto exception; } status = Py_InitializeFromConfig(&config); if (PyStatus_Exception(status)) { goto exception; } PyConfig_Clear(&config); PyRun_SimpleString("from time import time,ctime\n" "print('Today is', ctime(time()))\n"); if (Py_FinalizeEx() < 0) { exit(120); } return 0; exception: PyConfig_Clear(&config); Py_ExitStatusException(status); } Note: "#define PY_SSIZE_T_CLEAN" was used to indicate that "Py_ssize_t" should be used in some APIs instead of "int". It is not necessary since Python 3.13, but we keep it here for backward compatibility. See Strings and buffers for a description of this macro. Setting "PyConfig.program_name" should be called before "Py_InitializeFromConfig()" to inform the interpreter about paths to Python run-time libraries. Next, the Python interpreter is initialized with "Py_Initialize()", followed by the execution of a hard-coded Python script that prints the date and time. Afterwards, the "Py_FinalizeEx()" call shuts the interpreter down, followed by the end of the program. In a real program, you may want to get the Python script from another source, perhaps a text-editor routine, a file, or a database. Getting the Python code from a file can better be done by using the "PyRun_SimpleFile()" function, which saves you the trouble of allocating memory space and loading the file contents. 1.2. Beyond Very High Level Embedding: An overview ================================================== The high level interface gives you the ability to execute arbitrary pieces of Python code from your application, but exchanging data values is quite cumbersome to say the least. If you want that, you should use lower level calls. At the cost of having to write more C code, you can achieve almost anything. It should be noted that extending Python and embedding Python is quite the same activity, despite the different intent. Most topics discussed in the previous chapters are still valid. To show this, consider what the extension code from Python to C really does: 1. Convert data values from Python to C, 2. Perform a function call to a C routine using the converted values, and 3. Convert the data values from the call from C to Python. When embedding Python, the interface code does: 1. Convert data values from C to Python, 2. Perform a function call to a Python interface routine using the converted values, and 3. Convert the data values from the call from Python to C. As you can see, the data conversion steps are simply swapped to accommodate the different direction of the cross-language transfer. The only difference is the routine that you call between both data conversions. When extending, you call a C routine, when embedding, you call a Python routine. This chapter will not discuss how to convert data from Python to C and vice versa. Also, proper use of references and dealing with errors is assumed to be understood. Since these aspects do not differ from extending the interpreter, you can refer to earlier chapters for the required information. 1.3. Pure Embedding =================== The first program aims to execute a function in a Python script. Like in the section about the very high level interface, the Python interpreter does not directly interact with the application (but that will change in the next section). The code to run a function defined in a Python script is: #define PY_SSIZE_T_CLEAN #include int main(int argc, char *argv[]) { PyObject *pName, *pModule, *pFunc; PyObject *pArgs, *pValue; int i; if (argc < 3) { fprintf(stderr,"Usage: call pythonfile funcname [args]\n"); return 1; } Py_Initialize(); pName = PyUnicode_DecodeFSDefault(argv[1]); /* Error checking of pName left out */ pModule = PyImport_Import(pName); Py_DECREF(pName); if (pModule != NULL) { pFunc = PyObject_GetAttrString(pModule, argv[2]); /* pFunc is a new reference */ if (pFunc && PyCallable_Check(pFunc)) { pArgs = PyTuple_New(argc - 3); for (i = 0; i < argc - 3; ++i) { pValue = PyLong_FromLong(atoi(argv[i + 3])); if (!pValue) { Py_DECREF(pArgs); Py_DECREF(pModule); fprintf(stderr, "Cannot convert argument\n"); return 1; } /* pValue reference stolen here: */ PyTuple_SetItem(pArgs, i, pValue); } pValue = PyObject_CallObject(pFunc, pArgs); Py_DECREF(pArgs); if (pValue != NULL) { printf("Result of call: %ld\n", PyLong_AsLong(pValue)); Py_DECREF(pValue); } else { Py_DECREF(pFunc); Py_DECREF(pModule); PyErr_Print(); fprintf(stderr,"Call failed\n"); return 1; } } else { if (PyErr_Occurred()) PyErr_Print(); fprintf(stderr, "Cannot find function \"%s\"\n", argv[2]); } Py_XDECREF(pFunc); Py_DECREF(pModule); } else { PyErr_Print(); fprintf(stderr, "Failed to load \"%s\"\n", argv[1]); return 1; } if (Py_FinalizeEx() < 0) { return 120; } return 0; } This code loads a Python script using "argv[1]", and calls the function named in "argv[2]". Its integer arguments are the other values of the "argv" array. If you compile and link this program (let’s call the finished executable **call**), and use it to execute a Python script, such as: def multiply(a,b): print("Will compute", a, "times", b) c = 0 for i in range(0, a): c = c + b return c then the result should be: $ call multiply multiply 3 2 Will compute 3 times 2 Result of call: 6 Although the program is quite large for its functionality, most of the code is for data conversion between Python and C, and for error reporting. The interesting part with respect to embedding Python starts with Py_Initialize(); pName = PyUnicode_DecodeFSDefault(argv[1]); /* Error checking of pName left out */ pModule = PyImport_Import(pName); After initializing the interpreter, the script is loaded using "PyImport_Import()". This routine needs a Python string as its argument, which is constructed using the "PyUnicode_DecodeFSDefault()" data conversion routine. pFunc = PyObject_GetAttrString(pModule, argv[2]); /* pFunc is a new reference */ if (pFunc && PyCallable_Check(pFunc)) { ... } Py_XDECREF(pFunc); Once the script is loaded, the name we’re looking for is retrieved using "PyObject_GetAttrString()". If the name exists, and the object returned is callable, you can safely assume that it is a function. The program then proceeds by constructing a tuple of arguments as normal. The call to the Python function is then made with: pValue = PyObject_CallObject(pFunc, pArgs); Upon return of the function, "pValue" is either "NULL" or it contains a reference to the return value of the function. Be sure to release the reference after examining the value. 1.4. Extending Embedded Python ============================== Until now, the embedded Python interpreter had no access to functionality from the application itself. The Python API allows this by extending the embedded interpreter. That is, the embedded interpreter gets extended with routines provided by the application. While it sounds complex, it is not so bad. Simply forget for a while that the application starts the Python interpreter. Instead, consider the application to be a set of subroutines, and write some glue code that gives Python access to those routines, just like you would write a normal Python extension. For example: static int numargs=0; /* Return the number of arguments of the application command line */ static PyObject* emb_numargs(PyObject *self, PyObject *args) { if(!PyArg_ParseTuple(args, ":numargs")) return NULL; return PyLong_FromLong(numargs); } static PyMethodDef emb_module_methods[] = { {"numargs", emb_numargs, METH_VARARGS, "Return the number of arguments received by the process."}, {NULL, NULL, 0, NULL} }; static struct PyModuleDef emb_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "emb", .m_size = 0, .m_methods = emb_module_methods, }; static PyObject* PyInit_emb(void) { return PyModuleDef_Init(&emb_module); } Insert the above code just above the "main()" function. Also, insert the following two statements before the call to "Py_Initialize()": numargs = argc; PyImport_AppendInittab("emb", &PyInit_emb); These two lines initialize the "numargs" variable, and make the "emb.numargs()" function accessible to the embedded Python interpreter. With these extensions, the Python script can do things like import emb print("Number of arguments", emb.numargs()) In a real application, the methods will expose an API of the application to Python. 1.5. Embedding Python in C++ ============================ It is also possible to embed Python in a C++ program; precisely how this is done will depend on the details of the C++ system used; in general you will need to write the main program in C++, and use the C++ compiler to compile and link your program. There is no need to recompile Python itself using C++. 1.6. Compiling and Linking under Unix-like systems ================================================== It is not necessarily trivial to find the right flags to pass to your compiler (and linker) in order to embed the Python interpreter into your application, particularly because Python needs to load library modules implemented as C dynamic extensions (".so" files) linked against it. To find out the required compiler and linker flags, you can execute the "python*X.Y*-config" script which is generated as part of the installation process (a "python3-config" script may also be available). This script has several options, of which the following will be directly useful to you: * "pythonX.Y-config --cflags" will give you the recommended flags when compiling: $ /opt/bin/python3.11-config --cflags -I/opt/include/python3.11 -I/opt/include/python3.11 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall * "pythonX.Y-config --ldflags --embed" will give you the recommended flags when linking: $ /opt/bin/python3.11-config --ldflags --embed -L/opt/lib/python3.11/config-3.11-x86_64-linux-gnu -L/opt/lib -lpython3.11 -lpthread -ldl -lutil -lm Note: To avoid confusion between several Python installations (and especially between the system Python and your own compiled Python), it is recommended that you use the absolute path to "python*X.Y*-config", as in the above example. If this procedure doesn’t work for you (it is not guaranteed to work for all Unix-like platforms; however, we welcome bug reports) you will have to read your system’s documentation about dynamic linking and/or examine Python’s "Makefile" (use "sysconfig.get_makefile_filename()" to find its location) and compilation options. In this case, the "sysconfig" module is a useful tool to programmatically extract the configuration values that you will want to combine together. For example: >>> import sysconfig >>> sysconfig.get_config_var('LIBS') '-lpthread -ldl -lutil' >>> sysconfig.get_config_var('LINKFORSHARED') '-Xlinker -export-dynamic' 1. Extending Python with C or C++ ********************************* It is quite easy to add new built-in modules to Python, if you know how to program in C. Such *extension modules* can do two things that can’t be done directly in Python: they can implement new built-in object types, and they can call C library functions and system calls. To support extensions, the Python API (Application Programmers Interface) defines a set of functions, macros and variables that provide access to most aspects of the Python run-time system. The Python API is incorporated in a C source file by including the header ""Python.h"". The compilation of an extension module depends on its intended use as well as on your system setup; details are given in later chapters. Note: The C extension interface is specific to CPython, and extension modules do not work on other Python implementations. In many cases, it is possible to avoid writing C extensions and preserve portability to other implementations. For example, if your use case is calling C library functions or system calls, you should consider using the "ctypes" module or the cffi library rather than writing custom C code. These modules let you write Python code to interface with C code and are more portable between implementations of Python than writing and compiling a C extension module. 1.1. A Simple Example ===================== Let’s create an extension module called "spam" (the favorite food of Monty Python fans…) and let’s say we want to create a Python interface to the C library function "system()" [1]. This function takes a null- terminated character string as argument and returns an integer. We want this function to be callable from Python as follows: >>> import spam >>> status = spam.system("ls -l") Begin by creating a file "spammodule.c". (Historically, if a module is called "spam", the C file containing its implementation is called "spammodule.c"; if the module name is very long, like "spammify", the module name can be just "spammify.c".) The first two lines of our file can be: #define PY_SSIZE_T_CLEAN #include which pulls in the Python API (you can add a comment describing the purpose of the module and a copyright notice if you like). Note: Since Python may define some pre-processor definitions which affect the standard headers on some systems, you *must* include "Python.h" before any standard headers are included."#define PY_SSIZE_T_CLEAN" was used to indicate that "Py_ssize_t" should be used in some APIs instead of "int". It is not necessary since Python 3.13, but we keep it here for backward compatibility. See Strings and buffers for a description of this macro. All user-visible symbols defined by "Python.h" have a prefix of "Py" or "PY", except those defined in standard header files. For convenience, and since they are used extensively by the Python interpreter, ""Python.h"" includes a few standard header files: "", "", "", and "". If the latter header file does not exist on your system, it declares the functions "malloc()", "free()" and "realloc()" directly. The next thing we add to our module file is the C function that will be called when the Python expression "spam.system(string)" is evaluated (we’ll see shortly how it ends up being called): static PyObject * spam_system(PyObject *self, PyObject *args) { const char *command; int sts; if (!PyArg_ParseTuple(args, "s", &command)) return NULL; sts = system(command); return PyLong_FromLong(sts); } There is a straightforward translation from the argument list in Python (for example, the single expression ""ls -l"") to the arguments passed to the C function. The C function always has two arguments, conventionally named *self* and *args*. The *self* argument points to the module object for module-level functions; for a method it would point to the object instance. The *args* argument will be a pointer to a Python tuple object containing the arguments. Each item of the tuple corresponds to an argument in the call’s argument list. The arguments are Python objects — in order to do anything with them in our C function we have to convert them to C values. The function "PyArg_ParseTuple()" in the Python API checks the argument types and converts them to C values. It uses a template string to determine the required types of the arguments as well as the types of the C variables into which to store the converted values. More about this later. "PyArg_ParseTuple()" returns true (nonzero) if all arguments have the right type and its components have been stored in the variables whose addresses are passed. It returns false (zero) if an invalid argument list was passed. In the latter case it also raises an appropriate exception so the calling function can return "NULL" immediately (as we saw in the example). 1.2. Intermezzo: Errors and Exceptions ====================================== An important convention throughout the Python interpreter is the following: when a function fails, it should set an exception condition and return an error value (usually "-1" or a "NULL" pointer). Exception information is stored in three members of the interpreter’s thread state. These are "NULL" if there is no exception. Otherwise they are the C equivalents of the members of the Python tuple returned by "sys.exc_info()". These are the exception type, exception instance, and a traceback object. It is important to know about them to understand how errors are passed around. The Python API defines a number of functions to set various types of exceptions. The most common one is "PyErr_SetString()". Its arguments are an exception object and a C string. The exception object is usually a predefined object like "PyExc_ZeroDivisionError". The C string indicates the cause of the error and is converted to a Python string object and stored as the “associated value” of the exception. Another useful function is "PyErr_SetFromErrno()", which only takes an exception argument and constructs the associated value by inspection of the global variable "errno". The most general function is "PyErr_SetObject()", which takes two object arguments, the exception and its associated value. You don’t need to "Py_INCREF()" the objects passed to any of these functions. You can test non-destructively whether an exception has been set with "PyErr_Occurred()". This returns the current exception object, or "NULL" if no exception has occurred. You normally don’t need to call "PyErr_Occurred()" to see whether an error occurred in a function call, since you should be able to tell from the return value. When a function *f* that calls another function *g* detects that the latter fails, *f* should itself return an error value (usually "NULL" or "-1"). It should *not* call one of the "PyErr_*" functions — one has already been called by *g*. *f*’s caller is then supposed to also return an error indication to *its* caller, again *without* calling "PyErr_*", and so on — the most detailed cause of the error was already reported by the function that first detected it. Once the error reaches the Python interpreter’s main loop, this aborts the currently executing Python code and tries to find an exception handler specified by the Python programmer. (There are situations where a module can actually give a more detailed error message by calling another "PyErr_*" function, and in such cases it is fine to do so. As a general rule, however, this is not necessary, and can cause information about the cause of the error to be lost: most operations can fail for a variety of reasons.) To ignore an exception set by a function call that failed, the exception condition must be cleared explicitly by calling "PyErr_Clear()". The only time C code should call "PyErr_Clear()" is if it doesn’t want to pass the error on to the interpreter but wants to handle it completely by itself (possibly by trying something else, or pretending nothing went wrong). Every failing "malloc()" call must be turned into an exception — the direct caller of "malloc()" (or "realloc()") must call "PyErr_NoMemory()" and return a failure indicator itself. All the object-creating functions (for example, "PyLong_FromLong()") already do this, so this note is only relevant to those who call "malloc()" directly. Also note that, with the important exception of "PyArg_ParseTuple()" and friends, functions that return an integer status usually return a positive value or zero for success and "-1" for failure, like Unix system calls. Finally, be careful to clean up garbage (by making "Py_XDECREF()" or "Py_DECREF()" calls for objects you have already created) when you return an error indicator! The choice of which exception to raise is entirely yours. There are predeclared C objects corresponding to all built-in Python exceptions, such as "PyExc_ZeroDivisionError", which you can use directly. Of course, you should choose exceptions wisely — don’t use "PyExc_TypeError" to mean that a file couldn’t be opened (that should probably be "PyExc_OSError"). If something’s wrong with the argument list, the "PyArg_ParseTuple()" function usually raises "PyExc_TypeError". If you have an argument whose value must be in a particular range or must satisfy other conditions, "PyExc_ValueError" is appropriate. You can also define a new exception that is unique to your module. The simplest way to do this is to declare a static global object variable at the beginning of the file: static PyObject *SpamError = NULL; and initialize it by calling "PyErr_NewException()" in the module’s "Py_mod_exec" function ("spam_module_exec()"): SpamError = PyErr_NewException("spam.error", NULL, NULL); Since "SpamError" is a global variable, it will be overwitten every time the module is reinitialized, when the "Py_mod_exec" function is called. For now, let’s avoid the issue: we will block repeated initialization by raising an "ImportError": static PyObject *SpamError = NULL; static int spam_module_exec(PyObject *m) { if (SpamError != NULL) { PyErr_SetString(PyExc_ImportError, "cannot initialize spam module more than once"); return -1; } SpamError = PyErr_NewException("spam.error", NULL, NULL); if (PyModule_AddObjectRef(m, "SpamError", SpamError) < 0) { return -1; } return 0; } static PyModuleDef_Slot spam_module_slots[] = { {Py_mod_exec, spam_module_exec}, {0, NULL} }; static struct PyModuleDef spam_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "spam", .m_size = 0, // non-negative .m_slots = spam_module_slots, }; PyMODINIT_FUNC PyInit_spam(void) { return PyModuleDef_Init(&spam_module); } Note that the Python name for the exception object is "spam.error". The "PyErr_NewException()" function may create a class with the base class being "Exception" (unless another class is passed in instead of "NULL"), described in Built-in Exceptions. Note also that the "SpamError" variable retains a reference to the newly created exception class; this is intentional! Since the exception could be removed from the module by external code, an owned reference to the class is needed to ensure that it will not be discarded, causing "SpamError" to become a dangling pointer. Should it become a dangling pointer, C code which raises the exception could cause a core dump or other unintended side effects. For now, the "Py_DECREF()" call to remove this reference is missing. Even when the Python interpreter shuts down, the global "SpamError" variable will not be garbage-collected. It will “leak”. We did, however, ensure that this will happen at most once per process. We discuss the use of "PyMODINIT_FUNC" as a function return type later in this sample. The "spam.error" exception can be raised in your extension module using a call to "PyErr_SetString()" as shown below: static PyObject * spam_system(PyObject *self, PyObject *args) { const char *command; int sts; if (!PyArg_ParseTuple(args, "s", &command)) return NULL; sts = system(command); if (sts < 0) { PyErr_SetString(SpamError, "System command failed"); return NULL; } return PyLong_FromLong(sts); } 1.3. Back to the Example ======================== Going back to our example function, you should now be able to understand this statement: if (!PyArg_ParseTuple(args, "s", &command)) return NULL; It returns "NULL" (the error indicator for functions returning object pointers) if an error is detected in the argument list, relying on the exception set by "PyArg_ParseTuple()". Otherwise the string value of the argument has been copied to the local variable "command". This is a pointer assignment and you are not supposed to modify the string to which it points (so in Standard C, the variable "command" should properly be declared as "const char *command"). The next statement is a call to the Unix function "system()", passing it the string we just got from "PyArg_ParseTuple()": sts = system(command); Our "spam.system()" function must return the value of "sts" as a Python object. This is done using the function "PyLong_FromLong()". return PyLong_FromLong(sts); In this case, it will return an integer object. (Yes, even integers are objects on the heap in Python!) If you have a C function that returns no useful argument (a function returning void), the corresponding Python function must return "None". You need this idiom to do so (which is implemented by the "Py_RETURN_NONE" macro): Py_INCREF(Py_None); return Py_None; "Py_None" is the C name for the special Python object "None". It is a genuine Python object rather than a "NULL" pointer, which means “error” in most contexts, as we have seen. 1.4. The Module’s Method Table and Initialization Function ========================================================== I promised to show how "spam_system()" is called from Python programs. First, we need to list its name and address in a “method table”: static PyMethodDef spam_methods[] = { ... {"system", spam_system, METH_VARARGS, "Execute a shell command."}, ... {NULL, NULL, 0, NULL} /* Sentinel */ }; Note the third entry ("METH_VARARGS"). This is a flag telling the interpreter the calling convention to be used for the C function. It should normally always be "METH_VARARGS" or "METH_VARARGS | METH_KEYWORDS"; a value of "0" means that an obsolete variant of "PyArg_ParseTuple()" is used. When using only "METH_VARARGS", the function should expect the Python- level parameters to be passed in as a tuple acceptable for parsing via "PyArg_ParseTuple()"; more information on this function is provided below. The "METH_KEYWORDS" bit may be set in the third field if keyword arguments should be passed to the function. In this case, the C function should accept a third "PyObject *" parameter which will be a dictionary of keywords. Use "PyArg_ParseTupleAndKeywords()" to parse the arguments to such a function. The method table must be referenced in the module definition structure: static struct PyModuleDef spam_module = { ... .m_methods = spam_methods, ... }; This structure, in turn, must be passed to the interpreter in the module’s initialization function. The initialization function must be named "PyInit_name()", where *name* is the name of the module, and should be the only non-"static" item defined in the module file: PyMODINIT_FUNC PyInit_spam(void) { return PyModuleDef_Init(&spam_module); } Note that "PyMODINIT_FUNC" declares the function as "PyObject *" return type, declares any special linkage declarations required by the platform, and for C++ declares the function as "extern "C"". "PyInit_spam()" is called when each interpreter imports its module "spam" for the first time. (See below for comments about embedding Python.) A pointer to the module definition must be returned via "PyModuleDef_Init()", so that the import machinery can create the module and store it in "sys.modules". When embedding Python, the "PyInit_spam()" function is not called automatically unless there’s an entry in the "PyImport_Inittab" table. To add the module to the initialization table, use "PyImport_AppendInittab()", optionally followed by an import of the module: #define PY_SSIZE_T_CLEAN #include int main(int argc, char *argv[]) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); /* Add a built-in module, before Py_Initialize */ if (PyImport_AppendInittab("spam", PyInit_spam) == -1) { fprintf(stderr, "Error: could not extend in-built modules table\n"); exit(1); } /* Pass argv[0] to the Python interpreter */ status = PyConfig_SetBytesString(&config, &config.program_name, argv[0]); if (PyStatus_Exception(status)) { goto exception; } /* Initialize the Python interpreter. Required. If this step fails, it will be a fatal error. */ status = Py_InitializeFromConfig(&config); if (PyStatus_Exception(status)) { goto exception; } PyConfig_Clear(&config); /* Optionally import the module; alternatively, import can be deferred until the embedded script imports it. */ PyObject *pmodule = PyImport_ImportModule("spam"); if (!pmodule) { PyErr_Print(); fprintf(stderr, "Error: could not import module 'spam'\n"); } // ... use Python C API here ... return 0; exception: PyConfig_Clear(&config); Py_ExitStatusException(status); } Note: If you declare a global variable or a local static one, the module may experience unintended side-effects on re-initialisation, for example when removing entries from "sys.modules" or importing compiled modules into multiple interpreters within a process (or following a "fork()" without an intervening "exec()"). If module state is not yet fully isolated, authors should consider marking the module as having no support for subinterpreters (via "Py_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED"). A more substantial example module is included in the Python source distribution as "Modules/xxlimited.c". This file may be used as a template or simply read as an example. 1.5. Compilation and Linkage ============================ There are two more things to do before you can use your new extension: compiling and linking it with the Python system. If you use dynamic loading, the details may depend on the style of dynamic loading your system uses; see the chapters about building extension modules (chapter Building C and C++ Extensions) and additional information that pertains only to building on Windows (chapter Building C and C++ Extensions on Windows) for more information about this. If you can’t use dynamic loading, or if you want to make your module a permanent part of the Python interpreter, you will have to change the configuration setup and rebuild the interpreter. Luckily, this is very simple on Unix: just place your file ("spammodule.c" for example) in the "Modules/" directory of an unpacked source distribution, add a line to the file "Modules/Setup.local" describing your file: spam spammodule.o and rebuild the interpreter by running **make** in the toplevel directory. You can also run **make** in the "Modules/" subdirectory, but then you must first rebuild "Makefile" there by running ‘**make** Makefile’. (This is necessary each time you change the "Setup" file.) If your module requires additional libraries to link with, these can be listed on the line in the configuration file as well, for instance: spam spammodule.o -lX11 1.6. Calling Python Functions from C ==================================== So far we have concentrated on making C functions callable from Python. The reverse is also useful: calling Python functions from C. This is especially the case for libraries that support so-called “callback” functions. If a C interface makes use of callbacks, the equivalent Python often needs to provide a callback mechanism to the Python programmer; the implementation will require calling the Python callback functions from a C callback. Other uses are also imaginable. Fortunately, the Python interpreter is easily called recursively, and there is a standard interface to call a Python function. (I won’t dwell on how to call the Python parser with a particular string as input — if you’re interested, have a look at the implementation of the "-c" command line option in "Modules/main.c" from the Python source code.) Calling a Python function is easy. First, the Python program must somehow pass you the Python function object. You should provide a function (or some other interface) to do this. When this function is called, save a pointer to the Python function object (be careful to "Py_INCREF()" it!) in a global variable — or wherever you see fit. For example, the following function might be part of a module definition: static PyObject *my_callback = NULL; static PyObject * my_set_callback(PyObject *dummy, PyObject *args) { PyObject *result = NULL; PyObject *temp; if (PyArg_ParseTuple(args, "O:set_callback", &temp)) { if (!PyCallable_Check(temp)) { PyErr_SetString(PyExc_TypeError, "parameter must be callable"); return NULL; } Py_XINCREF(temp); /* Add a reference to new callback */ Py_XDECREF(my_callback); /* Dispose of previous callback */ my_callback = temp; /* Remember new callback */ /* Boilerplate to return "None" */ Py_INCREF(Py_None); result = Py_None; } return result; } This function must be registered with the interpreter using the "METH_VARARGS" flag; this is described in section The Module’s Method Table and Initialization Function. The "PyArg_ParseTuple()" function and its arguments are documented in section Extracting Parameters in Extension Functions. The macros "Py_XINCREF()" and "Py_XDECREF()" increment/decrement the reference count of an object and are safe in the presence of "NULL" pointers (but note that *temp* will not be "NULL" in this context). More info on them in section Reference Counts. Later, when it is time to call the function, you call the C function "PyObject_CallObject()". This function has two arguments, both pointers to arbitrary Python objects: the Python function, and the argument list. The argument list must always be a tuple object, whose length is the number of arguments. To call the Python function with no arguments, pass in "NULL", or an empty tuple; to call it with one argument, pass a singleton tuple. "Py_BuildValue()" returns a tuple when its format string consists of zero or more format codes between parentheses. For example: int arg; PyObject *arglist; PyObject *result; ... arg = 123; ... /* Time to call the callback */ arglist = Py_BuildValue("(i)", arg); result = PyObject_CallObject(my_callback, arglist); Py_DECREF(arglist); "PyObject_CallObject()" returns a Python object pointer: this is the return value of the Python function. "PyObject_CallObject()" is “reference-count-neutral” with respect to its arguments. In the example a new tuple was created to serve as the argument list, which is "Py_DECREF()"-ed immediately after the "PyObject_CallObject()" call. The return value of "PyObject_CallObject()" is “new”: either it is a brand new object, or it is an existing object whose reference count has been incremented. So, unless you want to save it in a global variable, you should somehow "Py_DECREF()" the result, even (especially!) if you are not interested in its value. Before you do this, however, it is important to check that the return value isn’t "NULL". If it is, the Python function terminated by raising an exception. If the C code that called "PyObject_CallObject()" is called from Python, it should now return an error indication to its Python caller, so the interpreter can print a stack trace, or the calling Python code can handle the exception. If this is not possible or desirable, the exception should be cleared by calling "PyErr_Clear()". For example: if (result == NULL) return NULL; /* Pass error back */ ...use result... Py_DECREF(result); Depending on the desired interface to the Python callback function, you may also have to provide an argument list to "PyObject_CallObject()". In some cases the argument list is also provided by the Python program, through the same interface that specified the callback function. It can then be saved and used in the same manner as the function object. In other cases, you may have to construct a new tuple to pass as the argument list. The simplest way to do this is to call "Py_BuildValue()". For example, if you want to pass an integral event code, you might use the following code: PyObject *arglist; ... arglist = Py_BuildValue("(l)", eventcode); result = PyObject_CallObject(my_callback, arglist); Py_DECREF(arglist); if (result == NULL) return NULL; /* Pass error back */ /* Here maybe use the result */ Py_DECREF(result); Note the placement of "Py_DECREF(arglist)" immediately after the call, before the error check! Also note that strictly speaking this code is not complete: "Py_BuildValue()" may run out of memory, and this should be checked. You may also call a function with keyword arguments by using "PyObject_Call()", which supports arguments and keyword arguments. As in the above example, we use "Py_BuildValue()" to construct the dictionary. PyObject *dict; ... dict = Py_BuildValue("{s:i}", "name", val); result = PyObject_Call(my_callback, NULL, dict); Py_DECREF(dict); if (result == NULL) return NULL; /* Pass error back */ /* Here maybe use the result */ Py_DECREF(result); 1.7. Extracting Parameters in Extension Functions ================================================= The "PyArg_ParseTuple()" function is declared as follows: int PyArg_ParseTuple(PyObject *arg, const char *format, ...); The *arg* argument must be a tuple object containing an argument list passed from Python to a C function. The *format* argument must be a format string, whose syntax is explained in Parsing arguments and building values in the Python/C API Reference Manual. The remaining arguments must be addresses of variables whose type is determined by the format string. Note that while "PyArg_ParseTuple()" checks that the Python arguments have the required types, it cannot check the validity of the addresses of C variables passed to the call: if you make mistakes there, your code will probably crash or at least overwrite random bits in memory. So be careful! Note that any Python object references which are provided to the caller are *borrowed* references; do not decrement their reference count! Some example calls: #define PY_SSIZE_T_CLEAN #include int ok; int i, j; long k, l; const char *s; Py_ssize_t size; ok = PyArg_ParseTuple(args, ""); /* No arguments */ /* Python call: f() */ ok = PyArg_ParseTuple(args, "s", &s); /* A string */ /* Possible Python call: f('whoops!') */ ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */ /* Possible Python call: f(1, 2, 'three') */ ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size); /* A pair of ints and a string, whose size is also returned */ /* Possible Python call: f((1, 2), 'three') */ { const char *file; const char *mode = "r"; int bufsize = 0; ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize); /* A string, and optionally another string and an integer */ /* Possible Python calls: f('spam') f('spam', 'w') f('spam', 'wb', 100000) */ } { int left, top, right, bottom, h, v; ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)", &left, &top, &right, &bottom, &h, &v); /* A rectangle and a point */ /* Possible Python call: f(((0, 0), (400, 300)), (10, 10)) */ } { Py_complex c; ok = PyArg_ParseTuple(args, "D:myfunction", &c); /* a complex, also providing a function name for errors */ /* Possible Python call: myfunction(1+2j) */ } 1.8. Keyword Parameters for Extension Functions =============================================== The "PyArg_ParseTupleAndKeywords()" function is declared as follows: int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict, const char *format, char * const *kwlist, ...); The *arg* and *format* parameters are identical to those of the "PyArg_ParseTuple()" function. The *kwdict* parameter is the dictionary of keywords received as the third parameter from the Python runtime. The *kwlist* parameter is a "NULL"-terminated list of strings which identify the parameters; the names are matched with the type information from *format* from left to right. On success, "PyArg_ParseTupleAndKeywords()" returns true, otherwise it returns false and raises an appropriate exception. Note: Nested tuples cannot be parsed when using keyword arguments! Keyword parameters passed in which are not present in the *kwlist* will cause "TypeError" to be raised. Here is an example module which uses keywords, based on an example by Geoff Philbrick (philbrick@hks.com): #define PY_SSIZE_T_CLEAN #include static PyObject * keywdarg_parrot(PyObject *self, PyObject *args, PyObject *keywds) { int voltage; const char *state = "a stiff"; const char *action = "voom"; const char *type = "Norwegian Blue"; static char *kwlist[] = {"voltage", "state", "action", "type", NULL}; if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist, &voltage, &state, &action, &type)) return NULL; printf("-- This parrot wouldn't %s if you put %i Volts through it.\n", action, voltage); printf("-- Lovely plumage, the %s -- It's %s!\n", type, state); Py_RETURN_NONE; } static PyMethodDef keywdarg_methods[] = { /* The cast of the function is necessary since PyCFunction values * only take two PyObject* parameters, and keywdarg_parrot() takes * three. */ {"parrot", (PyCFunction)(void(*)(void))keywdarg_parrot, METH_VARARGS | METH_KEYWORDS, "Print a lovely skit to standard output."}, {NULL, NULL, 0, NULL} /* sentinel */ }; static struct PyModuleDef keywdarg_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "keywdarg", .m_size = 0, .m_methods = keywdarg_methods, }; PyMODINIT_FUNC PyInit_keywdarg(void) { return PyModuleDef_Init(&keywdarg_module); } 1.9. Building Arbitrary Values ============================== This function is the counterpart to "PyArg_ParseTuple()". It is declared as follows: PyObject *Py_BuildValue(const char *format, ...); It recognizes a set of format units similar to the ones recognized by "PyArg_ParseTuple()", but the arguments (which are input to the function, not output) must not be pointers, just values. It returns a new Python object, suitable for returning from a C function called from Python. One difference with "PyArg_ParseTuple()": while the latter requires its first argument to be a tuple (since Python argument lists are always represented as tuples internally), "Py_BuildValue()" does not always build a tuple. It builds a tuple only if its format string contains two or more format units. If the format string is empty, it returns "None"; if it contains exactly one format unit, it returns whatever object is described by that format unit. To force it to return a tuple of size 0 or one, parenthesize the format string. Examples (to the left the call, to the right the resulting Python value): Py_BuildValue("") None Py_BuildValue("i", 123) 123 Py_BuildValue("iii", 123, 456, 789) (123, 456, 789) Py_BuildValue("s", "hello") 'hello' Py_BuildValue("y", "hello") b'hello' Py_BuildValue("ss", "hello", "world") ('hello', 'world') Py_BuildValue("s#", "hello", 4) 'hell' Py_BuildValue("y#", "hello", 4) b'hell' Py_BuildValue("()") () Py_BuildValue("(i)", 123) (123,) Py_BuildValue("(ii)", 123, 456) (123, 456) Py_BuildValue("(i,i)", 123, 456) (123, 456) Py_BuildValue("[i,i]", 123, 456) [123, 456] Py_BuildValue("{s:i,s:i}", "abc", 123, "def", 456) {'abc': 123, 'def': 456} Py_BuildValue("((ii)(ii)) (ii)", 1, 2, 3, 4, 5, 6) (((1, 2), (3, 4)), (5, 6)) 1.10. Reference Counts ====================== In languages like C or C++, the programmer is responsible for dynamic allocation and deallocation of memory on the heap. In C, this is done using the functions "malloc()" and "free()". In C++, the operators "new" and "delete" are used with essentially the same meaning and we’ll restrict the following discussion to the C case. Every block of memory allocated with "malloc()" should eventually be returned to the pool of available memory by exactly one call to "free()". It is important to call "free()" at the right time. If a block’s address is forgotten but "free()" is not called for it, the memory it occupies cannot be reused until the program terminates. This is called a *memory leak*. On the other hand, if a program calls "free()" for a block and then continues to use the block, it creates a conflict with reuse of the block through another "malloc()" call. This is called *using freed memory*. It has the same bad consequences as referencing uninitialized data — core dumps, wrong results, mysterious crashes. Common causes of memory leaks are unusual paths through the code. For instance, a function may allocate a block of memory, do some calculation, and then free the block again. Now a change in the requirements for the function may add a test to the calculation that detects an error condition and can return prematurely from the function. It’s easy to forget to free the allocated memory block when taking this premature exit, especially when it is added later to the code. Such leaks, once introduced, often go undetected for a long time: the error exit is taken only in a small fraction of all calls, and most modern machines have plenty of virtual memory, so the leak only becomes apparent in a long-running process that uses the leaking function frequently. Therefore, it’s important to prevent leaks from happening by having a coding convention or strategy that minimizes this kind of errors. Since Python makes heavy use of "malloc()" and "free()", it needs a strategy to avoid memory leaks as well as the use of freed memory. The chosen method is called *reference counting*. The principle is simple: every object contains a counter, which is incremented when a reference to the object is stored somewhere, and which is decremented when a reference to it is deleted. When the counter reaches zero, the last reference to the object has been deleted and the object is freed. An alternative strategy is called *automatic garbage collection*. (Sometimes, reference counting is also referred to as a garbage collection strategy, hence my use of “automatic” to distinguish the two.) The big advantage of automatic garbage collection is that the user doesn’t need to call "free()" explicitly. (Another claimed advantage is an improvement in speed or memory usage — this is no hard fact however.) The disadvantage is that for C, there is no truly portable automatic garbage collector, while reference counting can be implemented portably (as long as the functions "malloc()" and "free()" are available — which the C Standard guarantees). Maybe some day a sufficiently portable automatic garbage collector will be available for C. Until then, we’ll have to live with reference counts. While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are the weakness of garbage collection implemented using only reference counting. Reference cycles consist of objects which contain (possibly indirect) references to themselves, so that each object in the cycle has a reference count which is non-zero. Typical reference counting implementations are not able to reclaim the memory belonging to any objects in a reference cycle, or referenced from the objects in the cycle, even though there are no further references to the cycle itself. The cycle detector is able to detect garbage cycles and can reclaim them. The "gc" module exposes a way to run the detector (the "collect()" function), as well as configuration interfaces and the ability to disable the detector at runtime. 1.10.1. Reference Counting in Python ------------------------------------ There are two macros, "Py_INCREF(x)" and "Py_DECREF(x)", which handle the incrementing and decrementing of the reference count. "Py_DECREF()" also frees the object when the count reaches zero. For flexibility, it doesn’t call "free()" directly — rather, it makes a call through a function pointer in the object’s *type object*. For this purpose (and others), every object also contains a pointer to its type object. The big question now remains: when to use "Py_INCREF(x)" and "Py_DECREF(x)"? Let’s first introduce some terms. Nobody “owns” an object; however, you can *own a reference* to an object. An object’s reference count is now defined as the number of owned references to it. The owner of a reference is responsible for calling "Py_DECREF()" when the reference is no longer needed. Ownership of a reference can be transferred. There are three ways to dispose of an owned reference: pass it on, store it, or call "Py_DECREF()". Forgetting to dispose of an owned reference creates a memory leak. It is also possible to *borrow* [2] a reference to an object. The borrower of a reference should not call "Py_DECREF()". The borrower must not hold on to the object longer than the owner from which it was borrowed. Using a borrowed reference after the owner has disposed of it risks using freed memory and should be avoided completely [3]. The advantage of borrowing over owning a reference is that you don’t need to take care of disposing of the reference on all possible paths through the code — in other words, with a borrowed reference you don’t run the risk of leaking when a premature exit is taken. The disadvantage of borrowing over owning is that there are some subtle situations where in seemingly correct code a borrowed reference can be used after the owner from which it was borrowed has in fact disposed of it. A borrowed reference can be changed into an owned reference by calling "Py_INCREF()". This does not affect the status of the owner from which the reference was borrowed — it creates a new owned reference, and gives full owner responsibilities (the new owner must dispose of the reference properly, as well as the previous owner). 1.10.2. Ownership Rules ----------------------- Whenever an object reference is passed into or out of a function, it is part of the function’s interface specification whether ownership is transferred with the reference or not. Most functions that return a reference to an object pass on ownership with the reference. In particular, all functions whose function it is to create a new object, such as "PyLong_FromLong()" and "Py_BuildValue()", pass ownership to the receiver. Even if the object is not actually new, you still receive ownership of a new reference to that object. For instance, "PyLong_FromLong()" maintains a cache of popular values and can return a reference to a cached item. Many functions that extract objects from other objects also transfer ownership with the reference, for instance "PyObject_GetAttrString()". The picture is less clear, here, however, since a few common routines are exceptions: "PyTuple_GetItem()", "PyList_GetItem()", "PyDict_GetItem()", and "PyDict_GetItemString()" all return references that you borrow from the tuple, list or dictionary. The function "PyImport_AddModule()" also returns a borrowed reference, even though it may actually create the object it returns: this is possible because an owned reference to the object is stored in "sys.modules". When you pass an object reference into another function, in general, the function borrows the reference from you — if it needs to store it, it will use "Py_INCREF()" to become an independent owner. There are exactly two important exceptions to this rule: "PyTuple_SetItem()" and "PyList_SetItem()". These functions take over ownership of the item passed to them — even if they fail! (Note that "PyDict_SetItem()" and friends don’t take over ownership — they are “normal.”) When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns. Only when such a borrowed reference must be stored or passed on, it must be turned into an owned reference by calling "Py_INCREF()". The object reference returned from a C function that is called from Python must be an owned reference — ownership is transferred from the function to its caller. 1.10.3. Thin Ice ---------------- There are a few situations where seemingly harmless use of a borrowed reference can lead to problems. These all have to do with implicit invocations of the interpreter, which can cause the owner of a reference to dispose of it. The first and most important case to know about is using "Py_DECREF()" on an unrelated object while borrowing a reference to a list item. For instance: void bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); PyList_SetItem(list, 1, PyLong_FromLong(0L)); PyObject_Print(item, stdout, 0); /* BUG! */ } This function first borrows a reference to "list[0]", then replaces "list[1]" with the value "0", and finally prints the borrowed reference. Looks harmless, right? But it’s not! Let’s follow the control flow into "PyList_SetItem()". The list owns references to all its items, so when item 1 is replaced, it has to dispose of the original item 1. Now let’s suppose the original item 1 was an instance of a user-defined class, and let’s further suppose that the class defined a "__del__()" method. If this class instance has a reference count of 1, disposing of it will call its "__del__()" method. Since it is written in Python, the "__del__()" method can execute arbitrary Python code. Could it perhaps do something to invalidate the reference to "item" in "bug()"? You bet! Assuming that the list passed into "bug()" is accessible to the "__del__()" method, it could execute a statement to the effect of "del list[0]", and assuming this was the last reference to that object, it would free the memory associated with it, thereby invalidating "item". The solution, once you know the source of the problem, is easy: temporarily increment the reference count. The correct version of the function reads: void no_bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_INCREF(item); PyList_SetItem(list, 1, PyLong_FromLong(0L)); PyObject_Print(item, stdout, 0); Py_DECREF(item); } This is a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his "__del__()" methods would fail… The second case of problems with a borrowed reference is a variant involving threads. Normally, multiple threads in the Python interpreter can’t get in each other’s way, because there is a *global lock* protecting Python’s entire object space. However, it is possible to temporarily release this lock using the macro "Py_BEGIN_ALLOW_THREADS", and to re-acquire it using "Py_END_ALLOW_THREADS". This is common around blocking I/O calls, to let other threads use the processor while waiting for the I/O to complete. Obviously, the following function has the same problem as the previous one: void bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_BEGIN_ALLOW_THREADS ...some blocking I/O call... Py_END_ALLOW_THREADS PyObject_Print(item, stdout, 0); /* BUG! */ } 1.10.4. NULL Pointers --------------------- In general, functions that take object references as arguments do not expect you to pass them "NULL" pointers, and will dump core (or cause later core dumps) if you do so. Functions that return object references generally return "NULL" only to indicate that an exception occurred. The reason for not testing for "NULL" arguments is that functions often pass the objects they receive on to other function — if each function were to test for "NULL", there would be a lot of redundant tests and the code would run more slowly. It is better to test for "NULL" only at the “source:” when a pointer that may be "NULL" is received, for example, from "malloc()" or from a function that may raise an exception. The macros "Py_INCREF()" and "Py_DECREF()" do not check for "NULL" pointers — however, their variants "Py_XINCREF()" and "Py_XDECREF()" do. The macros for checking for a particular object type ("Pytype_Check()") don’t check for "NULL" pointers — again, there is much code that calls several of these in a row to test an object against various different expected types, and this would generate redundant tests. There are no variants with "NULL" checking. The C function calling mechanism guarantees that the argument list passed to C functions ("args" in the examples) is never "NULL" — in fact it guarantees that it is always a tuple [4]. It is a severe error to ever let a "NULL" pointer “escape” to the Python user. 1.11. Writing Extensions in C++ =============================== It is possible to write extension modules in C++. Some restrictions apply. If the main program (the Python interpreter) is compiled and linked by the C compiler, global or static objects with constructors cannot be used. This is not a problem if the main program is linked by the C++ compiler. Functions that will be called by the Python interpreter (in particular, module initialization functions) have to be declared using "extern "C"". It is unnecessary to enclose the Python header files in "extern "C" {...}" — they use this form already if the symbol "__cplusplus" is defined (all recent C++ compilers define this symbol). 1.12. Providing a C API for an Extension Module =============================================== Many extension modules just provide new functions and types to be used from Python, but sometimes the code in an extension module can be useful for other extension modules. For example, an extension module could implement a type “collection” which works like lists without order. Just like the standard Python list type has a C API which permits extension modules to create and manipulate lists, this new collection type should have a set of C functions for direct manipulation from other extension modules. At first sight this seems easy: just write the functions (without declaring them "static", of course), provide an appropriate header file, and document the C API. And in fact this would work if all extension modules were always linked statically with the Python interpreter. When modules are used as shared libraries, however, the symbols defined in one module may not be visible to another module. The details of visibility depend on the operating system; some systems use one global namespace for the Python interpreter and all extension modules (Windows, for example), whereas others require an explicit list of imported symbols at module link time (AIX is one example), or offer a choice of different strategies (most Unices). And even if symbols are globally visible, the module whose functions one wishes to call might not have been loaded yet! Portability therefore requires not to make any assumptions about symbol visibility. This means that all symbols in extension modules should be declared "static", except for the module’s initialization function, in order to avoid name clashes with other extension modules (as discussed in section The Module’s Method Table and Initialization Function). And it means that symbols that *should* be accessible from other extension modules must be exported in a different way. Python provides a special mechanism to pass C-level information (pointers) from one extension module to another one: Capsules. A Capsule is a Python data type which stores a pointer (void*). Capsules can only be created and accessed via their C API, but they can be passed around like any other Python object. In particular, they can be assigned to a name in an extension module’s namespace. Other extension modules can then import this module, retrieve the value of this name, and then retrieve the pointer from the Capsule. There are many ways in which Capsules can be used to export the C API of an extension module. Each function could get its own Capsule, or all C API pointers could be stored in an array whose address is published in a Capsule. And the various tasks of storing and retrieving the pointers can be distributed in different ways between the module providing the code and the client modules. Whichever method you choose, it’s important to name your Capsules properly. The function "PyCapsule_New()" takes a name parameter (const char*); you’re permitted to pass in a "NULL" name, but we strongly encourage you to specify a name. Properly named Capsules provide a degree of runtime type-safety; there is no feasible way to tell one unnamed Capsule from another. In particular, Capsules used to expose C APIs should be given a name following this convention: modulename.attributename The convenience function "PyCapsule_Import()" makes it easy to load a C API provided via a Capsule, but only if the Capsule’s name matches this convention. This behavior gives C API users a high degree of certainty that the Capsule they load contains the correct C API. The following example demonstrates an approach that puts most of the burden on the writer of the exporting module, which is appropriate for commonly used library modules. It stores all C API pointers (just one in the example!) in an array of void pointers which becomes the value of a Capsule. The header file corresponding to the module provides a macro that takes care of importing the module and retrieving its C API pointers; client modules only have to call this macro before accessing the C API. The exporting module is a modification of the "spam" module from section A Simple Example. The function "spam.system()" does not call the C library function "system()" directly, but a function "PySpam_System()", which would of course do something more complicated in reality (such as adding “spam” to every command). This function "PySpam_System()" is also exported to other extension modules. The function "PySpam_System()" is a plain C function, declared "static" like everything else: static int PySpam_System(const char *command) { return system(command); } The function "spam_system()" is modified in a trivial way: static PyObject * spam_system(PyObject *self, PyObject *args) { const char *command; int sts; if (!PyArg_ParseTuple(args, "s", &command)) return NULL; sts = PySpam_System(command); return PyLong_FromLong(sts); } In the beginning of the module, right after the line #include two more lines must be added: #define SPAM_MODULE #include "spammodule.h" The "#define" is used to tell the header file that it is being included in the exporting module, not a client module. Finally, the module’s "mod_exec" function must take care of initializing the C API pointer array: static int spam_module_exec(PyObject *m) { static void *PySpam_API[PySpam_API_pointers]; PyObject *c_api_object; /* Initialize the C API pointer array */ PySpam_API[PySpam_System_NUM] = (void *)PySpam_System; /* Create a Capsule containing the API pointer array's address */ c_api_object = PyCapsule_New((void *)PySpam_API, "spam._C_API", NULL); if (PyModule_Add(m, "_C_API", c_api_object) < 0) { return -1; } return 0; } Note that "PySpam_API" is declared "static"; otherwise the pointer array would disappear when "PyInit_spam()" terminates! The bulk of the work is in the header file "spammodule.h", which looks like this: #ifndef Py_SPAMMODULE_H #define Py_SPAMMODULE_H #ifdef __cplusplus extern "C" { #endif /* Header file for spammodule */ /* C API functions */ #define PySpam_System_NUM 0 #define PySpam_System_RETURN int #define PySpam_System_PROTO (const char *command) /* Total number of C API pointers */ #define PySpam_API_pointers 1 #ifdef SPAM_MODULE /* This section is used when compiling spammodule.c */ static PySpam_System_RETURN PySpam_System PySpam_System_PROTO; #else /* This section is used in modules that use spammodule's API */ static void **PySpam_API; #define PySpam_System \ (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM]) /* Return -1 on error, 0 on success. * PyCapsule_Import will set an exception if there's an error. */ static int import_spam(void) { PySpam_API = (void **)PyCapsule_Import("spam._C_API", 0); return (PySpam_API != NULL) ? 0 : -1; } #endif #ifdef __cplusplus } #endif #endif /* !defined(Py_SPAMMODULE_H) */ All that a client module must do in order to have access to the function "PySpam_System()" is to call the function (or rather macro) "import_spam()" in its "mod_exec" function: static int client_module_exec(PyObject *m) { if (import_spam() < 0) { return -1; } /* additional initialization can happen here */ return 0; } The main disadvantage of this approach is that the file "spammodule.h" is rather complicated. However, the basic structure is the same for each function that is exported, so it has to be learned only once. Finally it should be mentioned that Capsules offer additional functionality, which is especially useful for memory allocation and deallocation of the pointer stored in a Capsule. The details are described in the Python/C API Reference Manual in the section Capsules and in the implementation of Capsules (files "Include/pycapsule.h" and "Objects/pycapsule.c" in the Python source code distribution). -[ Footnotes ]- [1] An interface for this function already exists in the standard module "os" — it was chosen as a simple and straightforward example. [2] The metaphor of “borrowing” a reference is not completely correct: the owner still has a copy of the reference. [3] Checking that the reference count is at least 1 **does not work** — the reference count itself could be in freed memory and may thus be reused for another object! [4] These guarantees don’t hold when you use the “old” style calling convention — this is still found in much existing code. Extending and Embedding the Python Interpreter ********************************************** This document describes how to write modules in C or C++ to extend the Python interpreter with new modules. Those modules can not only define new functions but also new object types and their methods. The document also describes how to embed the Python interpreter in another application, for use as an extension language. Finally, it shows how to compile and link extension modules so that they can be loaded dynamically (at run time) into the interpreter, if the underlying operating system supports this feature. This document assumes basic knowledge about Python. For an informal introduction to the language, see The Python Tutorial. The Python Language Reference gives a more formal definition of the language. The Python Standard Library documents the existing object types, functions and modules (both built-in and written in Python) that give the language its wide application range. For a detailed description of the whole Python/C API, see the separate Python/C API Reference Manual. Recommended third party tools ============================= This guide only covers the basic tools for creating extensions provided as part of this version of CPython. Some third party tools offer both simpler and more sophisticated approaches to creating C and C++ extensions for Python. Creating extensions without third party tools ============================================= This section of the guide covers creating C and C++ extensions without assistance from third party tools. It is intended primarily for creators of those tools, rather than being a recommended way to create your own C extensions. See also: **PEP 489** – Multi-phase extension module initialization * 1. Extending Python with C or C++ * 1.1. A Simple Example * 1.2. Intermezzo: Errors and Exceptions * 1.3. Back to the Example * 1.4. The Module’s Method Table and Initialization Function * 1.5. Compilation and Linkage * 1.6. Calling Python Functions from C * 1.7. Extracting Parameters in Extension Functions * 1.8. Keyword Parameters for Extension Functions * 1.9. Building Arbitrary Values * 1.10. Reference Counts * 1.11. Writing Extensions in C++ * 1.12. Providing a C API for an Extension Module * 2. Defining Extension Types: Tutorial * 2.1. The Basics * 2.2. Adding data and methods to the Basic example * 2.3. Providing finer control over data attributes * 2.4. Supporting cyclic garbage collection * 2.5. Subclassing other types * 3. Defining Extension Types: Assorted Topics * 3.1. Finalization and De-allocation * 3.2. Object Presentation * 3.3. Attribute Management * 3.4. Object Comparison * 3.5. Abstract Protocol Support * 3.6. Weak Reference Support * 3.7. More Suggestions * 4. Building C and C++ Extensions * 4.1. Building C and C++ Extensions with setuptools * 5. Building C and C++ Extensions on Windows * 5.1. A Cookbook Approach * 5.2. Differences Between Unix and Windows * 5.3. Using DLLs in Practice Embedding the CPython runtime in a larger application ===================================================== Sometimes, rather than creating an extension that runs inside the Python interpreter as the main application, it is desirable to instead embed the CPython runtime inside a larger application. This section covers some of the details involved in doing that successfully. * 1. Embedding Python in Another Application * 1.1. Very High Level Embedding * 1.2. Beyond Very High Level Embedding: An overview * 1.3. Pure Embedding * 1.4. Extending Embedded Python * 1.5. Embedding Python in C++ * 1.6. Compiling and Linking under Unix-like systems 3. Defining Extension Types: Assorted Topics ******************************************** This section aims to give a quick fly-by on the various type methods you can implement and what they do. Here is the definition of "PyTypeObject", with some fields only used in debug builds omitted: typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "." */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; Py_ssize_t tp_vectorcall_offset; getattrfunc tp_getattr; setattrfunc tp_setattr; PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2) or tp_reserved (Python 3) */ reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ unsigned long tp_flags; const char *tp_doc; /* Documentation string */ /* Assigned meaning in release 2.0 */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* Assigned meaning in release 2.1 */ /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; // Strong reference on a heap type, borrowed reference on a static type struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; destructor tp_finalize; vectorcallfunc tp_vectorcall; /* bitset of which type-watchers care about this type */ unsigned char tp_watched; } PyTypeObject; Now that’s a *lot* of methods. Don’t worry too much though – if you have a type you want to define, the chances are very good that you will only implement a handful of these. As you probably expect by now, we’re going to go over this and give more information about the various handlers. We won’t go in the order they are defined in the structure, because there is a lot of historical baggage that impacts the ordering of the fields. It’s often easiest to find an example that includes the fields you need and then change the values to suit your new type. const char *tp_name; /* For printing */ The name of the type – as mentioned in the previous chapter, this will appear in various places, almost entirely for diagnostic purposes. Try to choose something that will be helpful in such a situation! Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ These fields tell the runtime how much memory to allocate when new objects of this type are created. Python has some built-in support for variable length structures (think: strings, tuples) which is where the "tp_itemsize" field comes in. This will be dealt with later. const char *tp_doc; Here you can put a string (or its address) that you want returned when the Python script references "obj.__doc__" to retrieve the doc string. Now we come to the basic type methods – the ones most extension types will implement. 3.1. Finalization and De-allocation =================================== destructor tp_dealloc; This function is called when the reference count of the instance of your type is reduced to zero and the Python interpreter wants to reclaim it. If your type has memory to free or other clean-up to perform, you can put it here. The object itself needs to be freed here as well. Here is an example of this function: static void newdatatype_dealloc(newdatatypeobject *obj) { free(obj->obj_UnderlyingDatatypePtr); Py_TYPE(obj)->tp_free((PyObject *)obj); } If your type supports garbage collection, the destructor should call "PyObject_GC_UnTrack()" before clearing any member fields: static void newdatatype_dealloc(newdatatypeobject *obj) { PyObject_GC_UnTrack(obj); Py_CLEAR(obj->other_obj); ... Py_TYPE(obj)->tp_free((PyObject *)obj); } One important requirement of the deallocator function is that it leaves any pending exceptions alone. This is important since deallocators are frequently called as the interpreter unwinds the Python stack; when the stack is unwound due to an exception (rather than normal returns), nothing is done to protect the deallocators from seeing that an exception has already been set. Any actions which a deallocator performs which may cause additional Python code to be executed may detect that an exception has been set. This can lead to misleading errors from the interpreter. The proper way to protect against this is to save a pending exception before performing the unsafe action, and restoring it when done. This can be done using the "PyErr_Fetch()" and "PyErr_Restore()" functions: static void my_dealloc(PyObject *obj) { MyObject *self = (MyObject *) obj; PyObject *cbresult; if (self->my_callback != NULL) { PyObject *err_type, *err_value, *err_traceback; /* This saves the current exception state */ PyErr_Fetch(&err_type, &err_value, &err_traceback); cbresult = PyObject_CallNoArgs(self->my_callback); if (cbresult == NULL) PyErr_WriteUnraisable(self->my_callback); else Py_DECREF(cbresult); /* This restores the saved exception state */ PyErr_Restore(err_type, err_value, err_traceback); Py_DECREF(self->my_callback); } Py_TYPE(obj)->tp_free((PyObject*)self); } Note: There are limitations to what you can safely do in a deallocator function. First, if your type supports garbage collection (using "tp_traverse" and/or "tp_clear"), some of the object’s members can have been cleared or finalized by the time "tp_dealloc" is called. Second, in "tp_dealloc", your object is in an unstable state: its reference count is equal to zero. Any call to a non-trivial object or API (as in the example above) might end up calling "tp_dealloc" again, causing a double free and a crash.Starting with Python 3.4, it is recommended not to put any complex finalization code in "tp_dealloc", and instead use the new "tp_finalize" type method. See also: **PEP 442** explains the new finalization scheme. 3.2. Object Presentation ======================== In Python, there are two ways to generate a textual representation of an object: the "repr()" function, and the "str()" function. (The "print()" function just calls "str()".) These handlers are both optional. reprfunc tp_repr; reprfunc tp_str; The "tp_repr" handler should return a string object containing a representation of the instance for which it is called. Here is a simple example: static PyObject * newdatatype_repr(newdatatypeobject *obj) { return PyUnicode_FromFormat("Repr-ified_newdatatype{{size:%d}}", obj->obj_UnderlyingDatatypePtr->size); } If no "tp_repr" handler is specified, the interpreter will supply a representation that uses the type’s "tp_name" and a uniquely identifying value for the object. The "tp_str" handler is to "str()" what the "tp_repr" handler described above is to "repr()"; that is, it is called when Python code calls "str()" on an instance of your object. Its implementation is very similar to the "tp_repr" function, but the resulting string is intended for human consumption. If "tp_str" is not specified, the "tp_repr" handler is used instead. Here is a simple example: static PyObject * newdatatype_str(newdatatypeobject *obj) { return PyUnicode_FromFormat("Stringified_newdatatype{{size:%d}}", obj->obj_UnderlyingDatatypePtr->size); } 3.3. Attribute Management ========================= For every object which can support attributes, the corresponding type must provide the functions that control how the attributes are resolved. There needs to be a function which can retrieve attributes (if any are defined), and another to set attributes (if setting attributes is allowed). Removing an attribute is a special case, for which the new value passed to the handler is "NULL". Python supports two pairs of attribute handlers; a type that supports attributes only needs to implement the functions for one pair. The difference is that one pair takes the name of the attribute as a char*, while the other accepts a PyObject*. Each type can use whichever pair makes more sense for the implementation’s convenience. getattrfunc tp_getattr; /* char * version */ setattrfunc tp_setattr; /* ... */ getattrofunc tp_getattro; /* PyObject * version */ setattrofunc tp_setattro; If accessing attributes of an object is always a simple operation (this will be explained shortly), there are generic implementations which can be used to provide the PyObject* version of the attribute management functions. The actual need for type-specific attribute handlers almost completely disappeared starting with Python 2.2, though there are many examples which have not been updated to use some of the new generic mechanism that is available. 3.3.1. Generic Attribute Management ----------------------------------- Most extension types only use *simple* attributes. So, what makes the attributes simple? There are only a couple of conditions that must be met: 1. The name of the attributes must be known when "PyType_Ready()" is called. 2. No special processing is needed to record that an attribute was looked up or set, nor do actions need to be taken based on the value. Note that this list does not place any restrictions on the values of the attributes, when the values are computed, or how relevant data is stored. When "PyType_Ready()" is called, it uses three tables referenced by the type object to create *descriptor*s which are placed in the dictionary of the type object. Each descriptor controls access to one attribute of the instance object. Each of the tables is optional; if all three are "NULL", instances of the type will only have attributes that are inherited from their base type, and should leave the "tp_getattro" and "tp_setattro" fields "NULL" as well, allowing the base type to handle attributes. The tables are declared as three fields of the type object: struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; If "tp_methods" is not "NULL", it must refer to an array of "PyMethodDef" structures. Each entry in the table is an instance of this structure: typedef struct PyMethodDef { const char *ml_name; /* method name */ PyCFunction ml_meth; /* implementation function */ int ml_flags; /* flags */ const char *ml_doc; /* docstring */ } PyMethodDef; One entry should be defined for each method provided by the type; no entries are needed for methods inherited from a base type. One additional entry is needed at the end; it is a sentinel that marks the end of the array. The "ml_name" field of the sentinel must be "NULL". The second table is used to define attributes which map directly to data stored in the instance. A variety of primitive C types are supported, and access may be read-only or read-write. The structures in the table are defined as: typedef struct PyMemberDef { const char *name; int type; int offset; int flags; const char *doc; } PyMemberDef; For each entry in the table, a *descriptor* will be constructed and added to the type which will be able to extract a value from the instance structure. The "type" field should contain a type code like "Py_T_INT" or "Py_T_DOUBLE"; the value will be used to determine how to convert Python values to and from C values. The "flags" field is used to store flags which control how the attribute can be accessed: you can set it to "Py_READONLY" to prevent Python code from setting it. An interesting advantage of using the "tp_members" table to build descriptors that are used at runtime is that any attribute defined this way can have an associated doc string simply by providing the text in the table. An application can use the introspection API to retrieve the descriptor from the class object, and get the doc string using its "__doc__" attribute. As with the "tp_methods" table, a sentinel entry with a "ml_name" value of "NULL" is required. 3.3.2. Type-specific Attribute Management ----------------------------------------- For simplicity, only the char* version will be demonstrated here; the type of the name parameter is the only difference between the char* and PyObject* flavors of the interface. This example effectively does the same thing as the generic example above, but does not use the generic support added in Python 2.2. It explains how the handler functions are called, so that if you do need to extend their functionality, you’ll understand what needs to be done. The "tp_getattr" handler is called when the object requires an attribute look-up. It is called in the same situations where the "__getattr__()" method of a class would be called. Here is an example: static PyObject * newdatatype_getattr(newdatatypeobject *obj, char *name) { if (strcmp(name, "data") == 0) { return PyLong_FromLong(obj->data); } PyErr_Format(PyExc_AttributeError, "'%.100s' object has no attribute '%.400s'", Py_TYPE(obj)->tp_name, name); return NULL; } The "tp_setattr" handler is called when the "__setattr__()" or "__delattr__()" method of a class instance would be called. When an attribute should be deleted, the third parameter will be "NULL". Here is an example that simply raises an exception; if this were really all you wanted, the "tp_setattr" handler should be set to "NULL". static int newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v) { PyErr_Format(PyExc_RuntimeError, "Read-only attribute: %s", name); return -1; } 3.4. Object Comparison ====================== richcmpfunc tp_richcompare; The "tp_richcompare" handler is called when comparisons are needed. It is analogous to the rich comparison methods, like "__lt__()", and also called by "PyObject_RichCompare()" and "PyObject_RichCompareBool()". This function is called with two Python objects and the operator as arguments, where the operator is one of "Py_EQ", "Py_NE", "Py_LE", "Py_GE", "Py_LT" or "Py_GT". It should compare the two objects with respect to the specified operator and return "Py_True" or "Py_False" if the comparison is successful, "Py_NotImplemented" to indicate that comparison is not implemented and the other object’s comparison method should be tried, or "NULL" if an exception was set. Here is a sample implementation, for a datatype that is considered equal if the size of an internal pointer is equal: static PyObject * newdatatype_richcmp(newdatatypeobject *obj1, newdatatypeobject *obj2, int op) { PyObject *result; int c, size1, size2; /* code to make sure that both arguments are of type newdatatype omitted */ size1 = obj1->obj_UnderlyingDatatypePtr->size; size2 = obj2->obj_UnderlyingDatatypePtr->size; switch (op) { case Py_LT: c = size1 < size2; break; case Py_LE: c = size1 <= size2; break; case Py_EQ: c = size1 == size2; break; case Py_NE: c = size1 != size2; break; case Py_GT: c = size1 > size2; break; case Py_GE: c = size1 >= size2; break; } result = c ? Py_True : Py_False; Py_INCREF(result); return result; } 3.5. Abstract Protocol Support ============================== Python supports a variety of *abstract* ‘protocols;’ the specific interfaces provided to use these interfaces are documented in Abstract Objects Layer. A number of these abstract interfaces were defined early in the development of the Python implementation. In particular, the number, mapping, and sequence protocols have been part of Python since the beginning. Other protocols have been added over time. For protocols which depend on several handler routines from the type implementation, the older protocols have been defined as optional blocks of handlers referenced by the type object. For newer protocols there are additional slots in the main type object, with a flag bit being set to indicate that the slots are present and should be checked by the interpreter. (The flag bit does not indicate that the slot values are non-"NULL". The flag may be set to indicate the presence of a slot, but a slot may still be unfilled.) PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; If you wish your object to be able to act like a number, a sequence, or a mapping object, then you place the address of a structure that implements the C type "PyNumberMethods", "PySequenceMethods", or "PyMappingMethods", respectively. It is up to you to fill in this structure with appropriate values. You can find examples of the use of each of these in the "Objects" directory of the Python source distribution. hashfunc tp_hash; This function, if you choose to provide it, should return a hash number for an instance of your data type. Here is a simple example: static Py_hash_t newdatatype_hash(newdatatypeobject *obj) { Py_hash_t result; result = obj->some_size + 32767 * obj->some_number; if (result == -1) result = -2; return result; } "Py_hash_t" is a signed integer type with a platform-varying width. Returning "-1" from "tp_hash" indicates an error, which is why you should be careful to avoid returning it when hash computation is successful, as seen above. ternaryfunc tp_call; This function is called when an instance of your data type is “called”, for example, if "obj1" is an instance of your data type and the Python script contains "obj1('hello')", the "tp_call" handler is invoked. This function takes three arguments: 1. *self* is the instance of the data type which is the subject of the call. If the call is "obj1('hello')", then *self* is "obj1". 2. *args* is a tuple containing the arguments to the call. You can use "PyArg_ParseTuple()" to extract the arguments. 3. *kwds* is a dictionary of keyword arguments that were passed. If this is non-"NULL" and you support keyword arguments, use "PyArg_ParseTupleAndKeywords()" to extract the arguments. If you do not want to support keyword arguments and this is non-"NULL", raise a "TypeError" with a message saying that keyword arguments are not supported. Here is a toy "tp_call" implementation: static PyObject * newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *kwds) { PyObject *result; const char *arg1; const char *arg2; const char *arg3; if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) { return NULL; } result = PyUnicode_FromFormat( "Returning -- value: [%d] arg1: [%s] arg2: [%s] arg3: [%s]\n", obj->obj_UnderlyingDatatypePtr->size, arg1, arg2, arg3); return result; } /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; These functions provide support for the iterator protocol. Both handlers take exactly one parameter, the instance for which they are being called, and return a new reference. In the case of an error, they should set an exception and return "NULL". "tp_iter" corresponds to the Python "__iter__()" method, while "tp_iternext" corresponds to the Python "__next__()" method. Any *iterable* object must implement the "tp_iter" handler, which must return an *iterator* object. Here the same guidelines apply as for Python classes: * For collections (such as lists and tuples) which can support multiple independent iterators, a new iterator should be created and returned by each call to "tp_iter". * Objects which can only be iterated over once (usually due to side effects of iteration, such as file objects) can implement "tp_iter" by returning a new reference to themselves – and should also therefore implement the "tp_iternext" handler. Any *iterator* object should implement both "tp_iter" and "tp_iternext". An iterator’s "tp_iter" handler should return a new reference to the iterator. Its "tp_iternext" handler should return a new reference to the next object in the iteration, if there is one. If the iteration has reached the end, "tp_iternext" may return "NULL" without setting an exception, or it may set "StopIteration" *in addition* to returning "NULL"; avoiding the exception can yield slightly better performance. If an actual error occurs, "tp_iternext" should always set an exception and return "NULL". 3.6. Weak Reference Support =========================== One of the goals of Python’s weak reference implementation is to allow any type to participate in the weak reference mechanism without incurring the overhead on performance-critical objects (such as numbers). See also: Documentation for the "weakref" module. For an object to be weakly referenceable, the extension type must set the "Py_TPFLAGS_MANAGED_WEAKREF" bit of the "tp_flags" field. The legacy "tp_weaklistoffset" field should be left as zero. Concretely, here is how the statically declared type object would look: static PyTypeObject TrivialType = { PyVarObject_HEAD_INIT(NULL, 0) /* ... other members omitted for brevity ... */ .tp_flags = Py_TPFLAGS_MANAGED_WEAKREF | ..., }; The only further addition is that "tp_dealloc" needs to clear any weak references (by calling "PyObject_ClearWeakRefs()"): static void Trivial_dealloc(TrivialObject *self) { /* Clear weakrefs first before calling any destructors */ PyObject_ClearWeakRefs((PyObject *) self); /* ... remainder of destruction code omitted for brevity ... */ Py_TYPE(self)->tp_free((PyObject *) self); } 3.7. More Suggestions ===================== In order to learn how to implement any specific method for your new data type, get the *CPython* source code. Go to the "Objects" directory, then search the C source files for "tp_" plus the function you want (for example, "tp_richcompare"). You will find examples of the function you want to implement. When you need to verify that an object is a concrete instance of the type you are implementing, use the "PyObject_TypeCheck()" function. A sample of its use might be something like the following: if (!PyObject_TypeCheck(some_object, &MyType)) { PyErr_SetString(PyExc_TypeError, "arg #1 not a mything"); return NULL; } See also: Download CPython source releases. https://www.python.org/downloads/source/ The CPython project on GitHub, where the CPython source code is developed. https://github.com/python/cpython 2. Defining Extension Types: Tutorial ************************************* Python allows the writer of a C extension module to define new types that can be manipulated from Python code, much like the built-in "str" and "list" types. The code for all extension types follows a pattern, but there are some details that you need to understand before you can get started. This document is a gentle introduction to the topic. 2.1. The Basics =============== The *CPython* runtime sees all Python objects as variables of type PyObject*, which serves as a “base type” for all Python objects. The "PyObject" structure itself only contains the object’s *reference count* and a pointer to the object’s “type object”. This is where the action is; the type object determines which (C) functions get called by the interpreter when, for instance, an attribute gets looked up on an object, a method called, or it is multiplied by another object. These C functions are called “type methods”. So, if you want to define a new extension type, you need to create a new type object. This sort of thing can only be explained by example, so here’s a minimal, but complete, module that defines a new type named "Custom" inside a C extension module "custom": Note: What we’re showing here is the traditional way of defining *static* extension types. It should be adequate for most uses. The C API also allows defining heap-allocated extension types using the "PyType_FromSpec()" function, which isn’t covered in this tutorial. #define PY_SSIZE_T_CLEAN #include typedef struct { PyObject_HEAD /* Type-specific fields go here. */ } CustomObject; static PyTypeObject CustomType = { .ob_base = PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT, .tp_new = PyType_GenericNew, }; static int custom_module_exec(PyObject *m) { if (PyType_Ready(&CustomType) < 0) { return -1; } if (PyModule_AddObjectRef(m, "Custom", (PyObject *) &CustomType) < 0) { return -1; } return 0; } static PyModuleDef_Slot custom_module_slots[] = { {Py_mod_exec, custom_module_exec}, // Just use this while using static types {Py_mod_multiple_interpreters, Py_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED}, {0, NULL} }; static PyModuleDef custom_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "custom", .m_doc = "Example module that creates an extension type.", .m_size = 0, .m_slots = custom_module_slots, }; PyMODINIT_FUNC PyInit_custom(void) { return PyModuleDef_Init(&custom_module); } Now that’s quite a bit to take in at once, but hopefully bits will seem familiar from the previous chapter. This file defines three things: 1. What a "Custom" **object** contains: this is the "CustomObject" struct, which is allocated once for each "Custom" instance. 2. How the "Custom" **type** behaves: this is the "CustomType" struct, which defines a set of flags and function pointers that the interpreter inspects when specific operations are requested. 3. How to define and execute the "custom" module: this is the "PyInit_custom" function and the associated "custom_module" struct for defining the module, and the "custom_module_exec" function to set up a fresh module object. The first bit is: typedef struct { PyObject_HEAD } CustomObject; This is what a Custom object will contain. "PyObject_HEAD" is mandatory at the start of each object struct and defines a field called "ob_base" of type "PyObject", containing a pointer to a type object and a reference count (these can be accessed using the macros "Py_TYPE" and "Py_REFCNT" respectively). The reason for the macro is to abstract away the layout and to enable additional fields in debug builds. Note: There is no semicolon above after the "PyObject_HEAD" macro. Be wary of adding one by accident: some compilers will complain. Of course, objects generally store additional data besides the standard "PyObject_HEAD" boilerplate; for example, here is the definition for standard Python floats: typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; The second bit is the definition of the type object. static PyTypeObject CustomType = { .ob_base = PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT, .tp_new = PyType_GenericNew, }; Note: We recommend using C99-style designated initializers as above, to avoid listing all the "PyTypeObject" fields that you don’t care about and also to avoid caring about the fields’ declaration order. The actual definition of "PyTypeObject" in "object.h" has many more fields than the definition above. The remaining fields will be filled with zeros by the C compiler, and it’s common practice to not specify them explicitly unless you need them. We’re going to pick it apart, one field at a time: .ob_base = PyVarObject_HEAD_INIT(NULL, 0) This line is mandatory boilerplate to initialize the "ob_base" field mentioned above. .tp_name = "custom.Custom", The name of our type. This will appear in the default textual representation of our objects and in some error messages, for example: >>> "" + custom.Custom() Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate str (not "custom.Custom") to str Note that the name is a dotted name that includes both the module name and the name of the type within the module. The module in this case is "custom" and the type is "Custom", so we set the type name to "custom.Custom". Using the real dotted import path is important to make your type compatible with the "pydoc" and "pickle" modules. .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, This is so that Python knows how much memory to allocate when creating new "Custom" instances. "tp_itemsize" is only used for variable-sized objects and should otherwise be zero. Note: If you want your type to be subclassable from Python, and your type has the same "tp_basicsize" as its base type, you may have problems with multiple inheritance. A Python subclass of your type will have to list your type first in its "__bases__", or else it will not be able to call your type’s "__new__()" method without getting an error. You can avoid this problem by ensuring that your type has a larger value for "tp_basicsize" than its base type does. Most of the time, this will be true anyway, because either your base type will be "object", or else you will be adding data members to your base type, and therefore increasing its size. We set the class flags to "Py_TPFLAGS_DEFAULT". .tp_flags = Py_TPFLAGS_DEFAULT, All types should include this constant in their flags. It enables all of the members defined until at least Python 3.3. If you need further members, you will need to OR the corresponding flags. We provide a doc string for the type in "tp_doc". .tp_doc = PyDoc_STR("Custom objects"), To enable object creation, we have to provide a "tp_new" handler. This is the equivalent of the Python method "__new__()", but has to be specified explicitly. In this case, we can just use the default implementation provided by the API function "PyType_GenericNew()". .tp_new = PyType_GenericNew, Everything else in the file should be familiar, except for some code in "custom_module_exec()": if (PyType_Ready(&CustomType) < 0) { return -1; } This initializes the "Custom" type, filling in a number of members to the appropriate default values, including "ob_type" that we initially set to "NULL". if (PyModule_AddObjectRef(m, "Custom", (PyObject *) &CustomType) < 0) { return -1; } This adds the type to the module dictionary. This allows us to create "Custom" instances by calling the "Custom" class: >>> import custom >>> mycustom = custom.Custom() That’s it! All that remains is to build it; put the above code in a file called "custom.c", [build-system] requires = ["setuptools"] build-backend = "setuptools.build_meta" [project] name = "custom" version = "1" in a file called "pyproject.toml", and from setuptools import Extension, setup setup(ext_modules=[Extension("custom", ["custom.c"])]) in a file called "setup.py"; then typing $ python -m pip install . in a shell should produce a file "custom.so" in a subdirectory and install it; now fire up Python — you should be able to "import custom" and play around with "Custom" objects. That wasn’t so hard, was it? Of course, the current Custom type is pretty uninteresting. It has no data and doesn’t do anything. It can’t even be subclassed. 2.2. Adding data and methods to the Basic example ================================================= Let’s extend the basic example to add some data and methods. Let’s also make the type usable as a base class. We’ll create a new module, "custom2" that adds these capabilities: #define PY_SSIZE_T_CLEAN #include #include /* for offsetof() */ typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; static void Custom_dealloc(CustomObject *self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject *) self); } static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, &first, &last, &self->number)) return -1; if (first) { Py_XSETREF(self->first, Py_NewRef(first)); } if (last) { Py_XSETREF(self->last, Py_NewRef(last)); } return 0; } static PyMemberDef Custom_members[] = { {"first", Py_T_OBJECT_EX, offsetof(CustomObject, first), 0, "first name"}, {"last", Py_T_OBJECT_EX, offsetof(CustomObject, last), 0, "last name"}, {"number", Py_T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { if (self->first == NULL) { PyErr_SetString(PyExc_AttributeError, "first"); return NULL; } if (self->last == NULL) { PyErr_SetString(PyExc_AttributeError, "last"); return NULL; } return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject CustomType = { .ob_base = PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom2.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, .tp_new = Custom_new, .tp_init = (initproc) Custom_init, .tp_dealloc = (destructor) Custom_dealloc, .tp_members = Custom_members, .tp_methods = Custom_methods, }; static int custom_module_exec(PyObject *m) { if (PyType_Ready(&CustomType) < 0) { return -1; } if (PyModule_AddObjectRef(m, "Custom", (PyObject *) &CustomType) < 0) { return -1; } return 0; } static PyModuleDef_Slot custom_module_slots[] = { {Py_mod_exec, custom_module_exec}, {Py_mod_multiple_interpreters, Py_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED}, {0, NULL} }; static PyModuleDef custom_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "custom2", .m_doc = "Example module that creates an extension type.", .m_size = 0, .m_slots = custom_module_slots, }; PyMODINIT_FUNC PyInit_custom2(void) { return PyModuleDef_Init(&custom_module); } This version of the module has a number of changes. The "Custom" type now has three data attributes in its C struct, *first*, *last*, and *number*. The *first* and *last* variables are Python strings containing first and last names. The *number* attribute is a C integer. The object structure is updated accordingly: typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; Because we now have data to manage, we have to be more careful about object allocation and deallocation. At a minimum, we need a deallocation method: static void Custom_dealloc(CustomObject *self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject *) self); } which is assigned to the "tp_dealloc" member: .tp_dealloc = (destructor) Custom_dealloc, This method first clears the reference counts of the two Python attributes. "Py_XDECREF()" correctly handles the case where its argument is "NULL" (which might happen here if "tp_new" failed midway). It then calls the "tp_free" member of the object’s type (computed by "Py_TYPE(self)") to free the object’s memory. Note that the object’s type might not be "CustomType", because the object may be an instance of a subclass. Note: The explicit cast to "destructor" above is needed because we defined "Custom_dealloc" to take a "CustomObject *" argument, but the "tp_dealloc" function pointer expects to receive a "PyObject *" argument. Otherwise, the compiler will emit a warning. This is object-oriented polymorphism, in C! We want to make sure that the first and last names are initialized to empty strings, so we provide a "tp_new" implementation: static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } and install it in the "tp_new" member: .tp_new = Custom_new, The "tp_new" handler is responsible for creating (as opposed to initializing) objects of the type. It is exposed in Python as the "__new__()" method. It is not required to define a "tp_new" member, and indeed many extension types will simply reuse "PyType_GenericNew()" as done in the first version of the "Custom" type above. In this case, we use the "tp_new" handler to initialize the "first" and "last" attributes to non-"NULL" default values. "tp_new" is passed the type being instantiated (not necessarily "CustomType", if a subclass is instantiated) and any arguments passed when the type was called, and is expected to return the instance created. "tp_new" handlers always accept positional and keyword arguments, but they often ignore the arguments, leaving the argument handling to initializer (a.k.a. "tp_init" in C or "__init__" in Python) methods. Note: "tp_new" shouldn’t call "tp_init" explicitly, as the interpreter will do it itself. The "tp_new" implementation calls the "tp_alloc" slot to allocate memory: self = (CustomObject *) type->tp_alloc(type, 0); Since memory allocation may fail, we must check the "tp_alloc" result against "NULL" before proceeding. Note: We didn’t fill the "tp_alloc" slot ourselves. Rather "PyType_Ready()" fills it for us by inheriting it from our base class, which is "object" by default. Most types use the default allocation strategy. Note: If you are creating a co-operative "tp_new" (one that calls a base type’s "tp_new" or "__new__()"), you must *not* try to determine what method to call using method resolution order at runtime. Always statically determine what type you are going to call, and call its "tp_new" directly, or via "type->tp_base->tp_new". If you do not do this, Python subclasses of your type that also inherit from other Python-defined classes may not work correctly. (Specifically, you may not be able to create instances of such subclasses without getting a "TypeError".) We also define an initialization function which accepts arguments to provide initial values for our instance: static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL, *tmp; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_XDECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_XDECREF(tmp); } return 0; } by filling the "tp_init" slot. .tp_init = (initproc) Custom_init, The "tp_init" slot is exposed in Python as the "__init__()" method. It is used to initialize an object after it’s created. Initializers always accept positional and keyword arguments, and they should return either "0" on success or "-1" on error. Unlike the "tp_new" handler, there is no guarantee that "tp_init" is called at all (for example, the "pickle" module by default doesn’t call "__init__()" on unpickled instances). It can also be called multiple times. Anyone can call the "__init__()" method on our objects. For this reason, we have to be extra careful when assigning the new attribute values. We might be tempted, for example to assign the "first" member like this: if (first) { Py_XDECREF(self->first); Py_INCREF(first); self->first = first; } But this would be risky. Our type doesn’t restrict the type of the "first" member, so it could be any kind of object. It could have a destructor that causes code to be executed that tries to access the "first" member; or that destructor could release the *Global interpreter Lock* and let arbitrary code run in other threads that accesses and modifies our object. To be paranoid and protect ourselves against this possibility, we almost always reassign members before decrementing their reference counts. When don’t we have to do this? * when we absolutely know that the reference count is greater than 1; * when we know that deallocation of the object [1] will neither release the *GIL* nor cause any calls back into our type’s code; * when decrementing a reference count in a "tp_dealloc" handler on a type which doesn’t support cyclic garbage collection [2]. We want to expose our instance variables as attributes. There are a number of ways to do that. The simplest way is to define member definitions: static PyMemberDef Custom_members[] = { {"first", Py_T_OBJECT_EX, offsetof(CustomObject, first), 0, "first name"}, {"last", Py_T_OBJECT_EX, offsetof(CustomObject, last), 0, "last name"}, {"number", Py_T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; and put the definitions in the "tp_members" slot: .tp_members = Custom_members, Each member definition has a member name, type, offset, access flags and documentation string. See the Generic Attribute Management section below for details. A disadvantage of this approach is that it doesn’t provide a way to restrict the types of objects that can be assigned to the Python attributes. We expect the first and last names to be strings, but any Python objects can be assigned. Further, the attributes can be deleted, setting the C pointers to "NULL". Even though we can make sure the members are initialized to non-"NULL" values, the members can be set to "NULL" if the attributes are deleted. We define a single method, "Custom.name()", that outputs the objects name as the concatenation of the first and last names. static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { if (self->first == NULL) { PyErr_SetString(PyExc_AttributeError, "first"); return NULL; } if (self->last == NULL) { PyErr_SetString(PyExc_AttributeError, "last"); return NULL; } return PyUnicode_FromFormat("%S %S", self->first, self->last); } The method is implemented as a C function that takes a "Custom" (or "Custom" subclass) instance as the first argument. Methods always take an instance as the first argument. Methods often take positional and keyword arguments as well, but in this case we don’t take any and don’t need to accept a positional argument tuple or keyword argument dictionary. This method is equivalent to the Python method: def name(self): return "%s %s" % (self.first, self.last) Note that we have to check for the possibility that our "first" and "last" members are "NULL". This is because they can be deleted, in which case they are set to "NULL". It would be better to prevent deletion of these attributes and to restrict the attribute values to be strings. We’ll see how to do that in the next section. Now that we’ve defined the method, we need to create an array of method definitions: static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; (note that we used the "METH_NOARGS" flag to indicate that the method is expecting no arguments other than *self*) and assign it to the "tp_methods" slot: .tp_methods = Custom_methods, Finally, we’ll make our type usable as a base class for subclassing. We’ve written our methods carefully so far so that they don’t make any assumptions about the type of the object being created or used, so all we need to do is to add the "Py_TPFLAGS_BASETYPE" to our class flag definition: .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, We rename "PyInit_custom()" to "PyInit_custom2()", update the module name in the "PyModuleDef" struct, and update the full class name in the "PyTypeObject" struct. Finally, we update our "setup.py" file to include the new module, from setuptools import Extension, setup setup(ext_modules=[ Extension("custom", ["custom.c"]), Extension("custom2", ["custom2.c"]), ]) and then we re-install so that we can "import custom2": $ python -m pip install . 2.3. Providing finer control over data attributes ================================================= In this section, we’ll provide finer control over how the "first" and "last" attributes are set in the "Custom" example. In the previous version of our module, the instance variables "first" and "last" could be set to non-string values or even deleted. We want to make sure that these attributes always contain strings. #define PY_SSIZE_T_CLEAN #include #include /* for offsetof() */ typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; static void Custom_dealloc(CustomObject *self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject *) self); } static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist, &first, &last, &self->number)) return -1; if (first) { Py_SETREF(self->first, Py_NewRef(first)); } if (last) { Py_SETREF(self->last, Py_NewRef(last)); } return 0; } static PyMemberDef Custom_members[] = { {"number", Py_T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; static PyObject * Custom_getfirst(CustomObject *self, void *closure) { return Py_NewRef(self->first); } static int Custom_setfirst(CustomObject *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a string"); return -1; } Py_SETREF(self->first, Py_NewRef(value)); return 0; } static PyObject * Custom_getlast(CustomObject *self, void *closure) { return Py_NewRef(self->last); } static int Custom_setlast(CustomObject *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The last attribute value must be a string"); return -1; } Py_SETREF(self->last, Py_NewRef(value)); return 0; } static PyGetSetDef Custom_getsetters[] = { {"first", (getter) Custom_getfirst, (setter) Custom_setfirst, "first name", NULL}, {"last", (getter) Custom_getlast, (setter) Custom_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject CustomType = { .ob_base = PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom3.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, .tp_new = Custom_new, .tp_init = (initproc) Custom_init, .tp_dealloc = (destructor) Custom_dealloc, .tp_members = Custom_members, .tp_methods = Custom_methods, .tp_getset = Custom_getsetters, }; static int custom_module_exec(PyObject *m) { if (PyType_Ready(&CustomType) < 0) { return -1; } if (PyModule_AddObjectRef(m, "Custom", (PyObject *) &CustomType) < 0) { return -1; } return 0; } static PyModuleDef_Slot custom_module_slots[] = { {Py_mod_exec, custom_module_exec}, {Py_mod_multiple_interpreters, Py_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED}, {0, NULL} }; static PyModuleDef custom_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "custom3", .m_doc = "Example module that creates an extension type.", .m_size = 0, .m_slots = custom_module_slots, }; PyMODINIT_FUNC PyInit_custom3(void) { return PyModuleDef_Init(&custom_module); } To provide greater control, over the "first" and "last" attributes, we’ll use custom getter and setter functions. Here are the functions for getting and setting the "first" attribute: static PyObject * Custom_getfirst(CustomObject *self, void *closure) { Py_INCREF(self->first); return self->first; } static int Custom_setfirst(CustomObject *self, PyObject *value, void *closure) { PyObject *tmp; if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a string"); return -1; } tmp = self->first; Py_INCREF(value); self->first = value; Py_DECREF(tmp); return 0; } The getter function is passed a "Custom" object and a “closure”, which is a void pointer. In this case, the closure is ignored. (The closure supports an advanced usage in which definition data is passed to the getter and setter. This could, for example, be used to allow a single set of getter and setter functions that decide the attribute to get or set based on data in the closure.) The setter function is passed the "Custom" object, the new value, and the closure. The new value may be "NULL", in which case the attribute is being deleted. In our setter, we raise an error if the attribute is deleted or if its new value is not a string. We create an array of "PyGetSetDef" structures: static PyGetSetDef Custom_getsetters[] = { {"first", (getter) Custom_getfirst, (setter) Custom_setfirst, "first name", NULL}, {"last", (getter) Custom_getlast, (setter) Custom_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; and register it in the "tp_getset" slot: .tp_getset = Custom_getsetters, The last item in a "PyGetSetDef" structure is the “closure” mentioned above. In this case, we aren’t using a closure, so we just pass "NULL". We also remove the member definitions for these attributes: static PyMemberDef Custom_members[] = { {"number", Py_T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; We also need to update the "tp_init" handler to only allow strings [3] to be passed: static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL, *tmp; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_DECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_DECREF(tmp); } return 0; } With these changes, we can assure that the "first" and "last" members are never "NULL" so we can remove checks for "NULL" values in almost all cases. This means that most of the "Py_XDECREF()" calls can be converted to "Py_DECREF()" calls. The only place we can’t change these calls is in the "tp_dealloc" implementation, where there is the possibility that the initialization of these members failed in "tp_new". We also rename the module initialization function and module name in the initialization function, as we did before, and we add an extra definition to the "setup.py" file. 2.4. Supporting cyclic garbage collection ========================================= Python has a *cyclic garbage collector (GC)* that can identify unneeded objects even when their reference counts are not zero. This can happen when objects are involved in cycles. For example, consider: >>> l = [] >>> l.append(l) >>> del l In this example, we create a list that contains itself. When we delete it, it still has a reference from itself. Its reference count doesn’t drop to zero. Fortunately, Python’s cyclic garbage collector will eventually figure out that the list is garbage and free it. In the second version of the "Custom" example, we allowed any kind of object to be stored in the "first" or "last" attributes [4]. Besides, in the second and third versions, we allowed subclassing "Custom", and subclasses may add arbitrary attributes. For any of those two reasons, "Custom" objects can participate in cycles: >>> import custom3 >>> class Derived(custom3.Custom): pass ... >>> n = Derived() >>> n.some_attribute = n To allow a "Custom" instance participating in a reference cycle to be properly detected and collected by the cyclic GC, our "Custom" type needs to fill two additional slots and to enable a flag that enables these slots: #define PY_SSIZE_T_CLEAN #include #include /* for offsetof() */ typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; static int Custom_traverse(CustomObject *self, visitproc visit, void *arg) { Py_VISIT(self->first); Py_VISIT(self->last); return 0; } static int Custom_clear(CustomObject *self) { Py_CLEAR(self->first); Py_CLEAR(self->last); return 0; } static void Custom_dealloc(CustomObject *self) { PyObject_GC_UnTrack(self); Custom_clear(self); Py_TYPE(self)->tp_free((PyObject *) self); } static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist, &first, &last, &self->number)) return -1; if (first) { Py_SETREF(self->first, Py_NewRef(first)); } if (last) { Py_SETREF(self->last, Py_NewRef(last)); } return 0; } static PyMemberDef Custom_members[] = { {"number", Py_T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; static PyObject * Custom_getfirst(CustomObject *self, void *closure) { return Py_NewRef(self->first); } static int Custom_setfirst(CustomObject *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a string"); return -1; } Py_XSETREF(self->first, Py_NewRef(value)); return 0; } static PyObject * Custom_getlast(CustomObject *self, void *closure) { return Py_NewRef(self->last); } static int Custom_setlast(CustomObject *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The last attribute value must be a string"); return -1; } Py_XSETREF(self->last, Py_NewRef(value)); return 0; } static PyGetSetDef Custom_getsetters[] = { {"first", (getter) Custom_getfirst, (setter) Custom_setfirst, "first name", NULL}, {"last", (getter) Custom_getlast, (setter) Custom_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject CustomType = { .ob_base = PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom4.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, .tp_new = Custom_new, .tp_init = (initproc) Custom_init, .tp_dealloc = (destructor) Custom_dealloc, .tp_traverse = (traverseproc) Custom_traverse, .tp_clear = (inquiry) Custom_clear, .tp_members = Custom_members, .tp_methods = Custom_methods, .tp_getset = Custom_getsetters, }; static int custom_module_exec(PyObject *m) { if (PyType_Ready(&CustomType) < 0) { return -1; } if (PyModule_AddObjectRef(m, "Custom", (PyObject *) &CustomType) < 0) { return -1; } return 0; } static PyModuleDef_Slot custom_module_slots[] = { {Py_mod_exec, custom_module_exec}, {Py_mod_multiple_interpreters, Py_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED}, {0, NULL} }; static PyModuleDef custom_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "custom4", .m_doc = "Example module that creates an extension type.", .m_size = 0, .m_slots = custom_module_slots, }; PyMODINIT_FUNC PyInit_custom4(void) { return PyModuleDef_Init(&custom_module); } First, the traversal method lets the cyclic GC know about subobjects that could participate in cycles: static int Custom_traverse(CustomObject *self, visitproc visit, void *arg) { int vret; if (self->first) { vret = visit(self->first, arg); if (vret != 0) return vret; } if (self->last) { vret = visit(self->last, arg); if (vret != 0) return vret; } return 0; } For each subobject that can participate in cycles, we need to call the "visit()" function, which is passed to the traversal method. The "visit()" function takes as arguments the subobject and the extra argument *arg* passed to the traversal method. It returns an integer value that must be returned if it is non-zero. Python provides a "Py_VISIT()" macro that automates calling visit functions. With "Py_VISIT()", we can minimize the amount of boilerplate in "Custom_traverse": static int Custom_traverse(CustomObject *self, visitproc visit, void *arg) { Py_VISIT(self->first); Py_VISIT(self->last); return 0; } Note: The "tp_traverse" implementation must name its arguments exactly *visit* and *arg* in order to use "Py_VISIT()". Second, we need to provide a method for clearing any subobjects that can participate in cycles: static int Custom_clear(CustomObject *self) { Py_CLEAR(self->first); Py_CLEAR(self->last); return 0; } Notice the use of the "Py_CLEAR()" macro. It is the recommended and safe way to clear data attributes of arbitrary types while decrementing their reference counts. If you were to call "Py_XDECREF()" instead on the attribute before setting it to "NULL", there is a possibility that the attribute’s destructor would call back into code that reads the attribute again (*especially* if there is a reference cycle). Note: You could emulate "Py_CLEAR()" by writing: PyObject *tmp; tmp = self->first; self->first = NULL; Py_XDECREF(tmp); Nevertheless, it is much easier and less error-prone to always use "Py_CLEAR()" when deleting an attribute. Don’t try to micro- optimize at the expense of robustness! The deallocator "Custom_dealloc" may call arbitrary code when clearing attributes. It means the circular GC can be triggered inside the function. Since the GC assumes reference count is not zero, we need to untrack the object from the GC by calling "PyObject_GC_UnTrack()" before clearing members. Here is our reimplemented deallocator using "PyObject_GC_UnTrack()" and "Custom_clear": static void Custom_dealloc(CustomObject *self) { PyObject_GC_UnTrack(self); Custom_clear(self); Py_TYPE(self)->tp_free((PyObject *) self); } Finally, we add the "Py_TPFLAGS_HAVE_GC" flag to the class flags: .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, That’s pretty much it. If we had written custom "tp_alloc" or "tp_free" handlers, we’d need to modify them for cyclic garbage collection. Most extensions will use the versions automatically provided. 2.5. Subclassing other types ============================ It is possible to create new extension types that are derived from existing types. It is easiest to inherit from the built in types, since an extension can easily use the "PyTypeObject" it needs. It can be difficult to share these "PyTypeObject" structures between extension modules. In this example we will create a "SubList" type that inherits from the built-in "list" type. The new type will be completely compatible with regular lists, but will have an additional "increment()" method that increases an internal counter: >>> import sublist >>> s = sublist.SubList(range(3)) >>> s.extend(s) >>> print(len(s)) 6 >>> print(s.increment()) 1 >>> print(s.increment()) 2 #define PY_SSIZE_T_CLEAN #include typedef struct { PyListObject list; int state; } SubListObject; static PyObject * SubList_increment(SubListObject *self, PyObject *unused) { self->state++; return PyLong_FromLong(self->state); } static PyMethodDef SubList_methods[] = { {"increment", (PyCFunction) SubList_increment, METH_NOARGS, PyDoc_STR("increment state counter")}, {NULL}, }; static int SubList_init(SubListObject *self, PyObject *args, PyObject *kwds) { if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0) return -1; self->state = 0; return 0; } static PyTypeObject SubListType = { .ob_base = PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "sublist.SubList", .tp_doc = PyDoc_STR("SubList objects"), .tp_basicsize = sizeof(SubListObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, .tp_init = (initproc) SubList_init, .tp_methods = SubList_methods, }; static int sublist_module_exec(PyObject *m) { SubListType.tp_base = &PyList_Type; if (PyType_Ready(&SubListType) < 0) { return -1; } if (PyModule_AddObjectRef(m, "SubList", (PyObject *) &SubListType) < 0) { return -1; } return 0; } static PyModuleDef_Slot sublist_module_slots[] = { {Py_mod_exec, sublist_module_exec}, {Py_mod_multiple_interpreters, Py_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED}, {0, NULL} }; static PyModuleDef sublist_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "sublist", .m_doc = "Example module that creates an extension type.", .m_size = 0, .m_slots = sublist_module_slots, }; PyMODINIT_FUNC PyInit_sublist(void) { return PyModuleDef_Init(&sublist_module); } As you can see, the source code closely resembles the "Custom" examples in previous sections. We will break down the main differences between them. typedef struct { PyListObject list; int state; } SubListObject; The primary difference for derived type objects is that the base type’s object structure must be the first value. The base type will already include the "PyObject_HEAD()" at the beginning of its structure. When a Python object is a "SubList" instance, its "PyObject *" pointer can be safely cast to both "PyListObject *" and "SubListObject *": static int SubList_init(SubListObject *self, PyObject *args, PyObject *kwds) { if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0) return -1; self->state = 0; return 0; } We see above how to call through to the "__init__()" method of the base type. This pattern is important when writing a type with custom "tp_new" and "tp_dealloc" members. The "tp_new" handler should not actually create the memory for the object with its "tp_alloc", but let the base class handle it by calling its own "tp_new". The "PyTypeObject" struct supports a "tp_base" specifying the type’s concrete base class. Due to cross-platform compiler issues, you can’t fill that field directly with a reference to "PyList_Type"; it should be done in the "Py_mod_exec" function: static int sublist_module_exec(PyObject *m) { SubListType.tp_base = &PyList_Type; if (PyType_Ready(&SubListType) < 0) { return -1; } if (PyModule_AddObjectRef(m, "SubList", (PyObject *) &SubListType) < 0) { return -1; } return 0; } Before calling "PyType_Ready()", the type structure must have the "tp_base" slot filled in. When we are deriving an existing type, it is not necessary to fill out the "tp_alloc" slot with "PyType_GenericNew()" – the allocation function from the base type will be inherited. After that, calling "PyType_Ready()" and adding the type object to the module is the same as with the basic "Custom" examples. -[ Footnotes ]- [1] This is true when we know that the object is a basic type, like a string or a float. [2] We relied on this in the "tp_dealloc" handler in this example, because our type doesn’t support garbage collection. [3] We now know that the first and last members are strings, so perhaps we could be less careful about decrementing their reference counts, however, we accept instances of string subclasses. Even though deallocating normal strings won’t call back into our objects, we can’t guarantee that deallocating an instance of a string subclass won’t call back into our objects. [4] Also, even with our attributes restricted to strings instances, the user could pass arbitrary "str" subclasses and therefore still create reference cycles. 5. Building C and C++ Extensions on Windows ******************************************* This chapter briefly explains how to create a Windows extension module for Python using Microsoft Visual C++, and follows with more detailed background information on how it works. The explanatory material is useful for both the Windows programmer learning to build Python extensions and the Unix programmer interested in producing software which can be successfully built on both Unix and Windows. Module authors are encouraged to use the distutils approach for building extension modules, instead of the one described in this section. You will still need the C compiler that was used to build Python; typically Microsoft Visual C++. Note: This chapter mentions a number of filenames that include an encoded Python version number. These filenames are represented with the version number shown as "XY"; in practice, "'X'" will be the major version number and "'Y'" will be the minor version number of the Python release you’re working with. For example, if you are using Python 2.2.1, "XY" will actually be "22". 5.1. A Cookbook Approach ======================== There are two approaches to building extension modules on Windows, just as there are on Unix: use the "setuptools" package to control the build process, or do things manually. The setuptools approach works well for most extensions; documentation on using "setuptools" to build and package extension modules is available in Building C and C++ Extensions with setuptools. If you find you really need to do things manually, it may be instructive to study the project file for the winsound standard library module. 5.2. Differences Between Unix and Windows ========================================= Unix and Windows use completely different paradigms for run-time loading of code. Before you try to build a module that can be dynamically loaded, be aware of how your system works. In Unix, a shared object (".so") file contains code to be used by the program, and also the names of functions and data that it expects to find in the program. When the file is joined to the program, all references to those functions and data in the file’s code are changed to point to the actual locations in the program where the functions and data are placed in memory. This is basically a link operation. In Windows, a dynamic-link library (".dll") file has no dangling references. Instead, an access to functions or data goes through a lookup table. So the DLL code does not have to be fixed up at runtime to refer to the program’s memory; instead, the code already uses the DLL’s lookup table, and the lookup table is modified at runtime to point to the functions and data. In Unix, there is only one type of library file (".a") which contains code from several object files (".o"). During the link step to create a shared object file (".so"), the linker may find that it doesn’t know where an identifier is defined. The linker will look for it in the object files in the libraries; if it finds it, it will include all the code from that object file. In Windows, there are two types of library, a static library and an import library (both called ".lib"). A static library is like a Unix ".a" file; it contains code to be included as necessary. An import library is basically used only to reassure the linker that a certain identifier is legal, and will be present in the program when the DLL is loaded. So the linker uses the information from the import library to build the lookup table for using identifiers that are not included in the DLL. When an application or a DLL is linked, an import library may be generated, which will need to be used for all future DLLs that depend on the symbols in the application or DLL. Suppose you are building two dynamic-load modules, B and C, which should share another block of code A. On Unix, you would *not* pass "A.a" to the linker for "B.so" and "C.so"; that would cause it to be included twice, so that B and C would each have their own copy. In Windows, building "A.dll" will also build "A.lib". You *do* pass "A.lib" to the linker for B and C. "A.lib" does not contain code; it just contains information which will be used at runtime to access A’s code. In Windows, using an import library is sort of like using "import spam"; it gives you access to spam’s names, but does not create a separate copy. On Unix, linking with a library is more like "from spam import *"; it does create a separate copy. 5.3. Using DLLs in Practice =========================== Windows Python is built in Microsoft Visual C++; using other compilers may or may not work. The rest of this section is MSVC++ specific. When creating DLLs in Windows, you must pass "pythonXY.lib" to the linker. To build two DLLs, spam and ni (which uses C functions found in spam), you could use these commands: cl /LD /I/python/include spam.c ../libs/pythonXY.lib cl /LD /I/python/include ni.c spam.lib ../libs/pythonXY.lib The first command created three files: "spam.obj", "spam.dll" and "spam.lib". "Spam.dll" does not contain any Python functions (such as "PyArg_ParseTuple()"), but it does know how to find the Python code thanks to "pythonXY.lib". The second command created "ni.dll" (and ".obj" and ".lib"), which knows how to find the necessary functions from spam, and also from the Python executable. Not every identifier is exported to the lookup table. If you want any other modules (including Python) to be able to see your identifiers, you have to say "_declspec(dllexport)", as in "void _declspec(dllexport) initspam(void)" or "PyObject _declspec(dllexport) *NiGetSpamData(void)". Developer Studio will throw in a lot of import libraries that you do not really need, adding about 100K to your executable. To get rid of them, use the Project Settings dialog, Link tab, to specify *ignore default libraries*. Add the correct "msvcrt*xx*.lib" to the list of libraries. Design and History FAQ ********************** Why does Python use indentation for grouping of statements? =========================================================== Guido van Rossum believes that using indentation for grouping is extremely elegant and contributes a lot to the clarity of the average Python program. Most people learn to love this feature after a while. Since there are no begin/end brackets there cannot be a disagreement between grouping perceived by the parser and the human reader. Occasionally C programmers will encounter a fragment of code like this: if (x <= y) x++; y--; z++; Only the "x++" statement is executed if the condition is true, but the indentation leads many to believe otherwise. Even experienced C programmers will sometimes stare at it a long time wondering as to why "y" is being decremented even for "x > y". Because there are no begin/end brackets, Python is much less prone to coding-style conflicts. In C there are many different ways to place the braces. After becoming used to reading and writing code using a particular style, it is normal to feel somewhat uneasy when reading (or being required to write) in a different one. Many coding styles place begin/end brackets on a line by themselves. This makes programs considerably longer and wastes valuable screen space, making it harder to get a good overview of a program. Ideally, a function should fit on one screen (say, 20–30 lines). 20 lines of Python can do a lot more work than 20 lines of C. This is not solely due to the lack of begin/end brackets – the lack of declarations and the high-level data types are also responsible – but the indentation- based syntax certainly helps. Why am I getting strange results with simple arithmetic operations? =================================================================== See the next question. Why are floating-point calculations so inaccurate? ================================================== Users are often surprised by results like this: >>> 1.2 - 1.0 0.19999999999999996 and think it is a bug in Python. It’s not. This has little to do with Python, and much more to do with how the underlying platform handles floating-point numbers. The "float" type in CPython uses a C "double" for storage. A "float" object’s value is stored in binary floating-point with a fixed precision (typically 53 bits) and Python uses C operations, which in turn rely on the hardware implementation in the processor, to perform floating-point operations. This means that as far as floating-point operations are concerned, Python behaves like many popular languages including C and Java. Many numbers that can be written easily in decimal notation cannot be expressed exactly in binary floating point. For example, after: >>> x = 1.2 the value stored for "x" is a (very good) approximation to the decimal value "1.2", but is not exactly equal to it. On a typical machine, the actual stored value is: 1.0011001100110011001100110011001100110011001100110011 (binary) which is exactly: 1.1999999999999999555910790149937383830547332763671875 (decimal) The typical precision of 53 bits provides Python floats with 15–16 decimal digits of accuracy. For a fuller explanation, please see the floating-point arithmetic chapter in the Python tutorial. Why are Python strings immutable? ================================= There are several advantages. One is performance: knowing that a string is immutable means we can allocate space for it at creation time, and the storage requirements are fixed and unchanging. This is also one of the reasons for the distinction between tuples and lists. Another advantage is that strings in Python are considered as “elemental” as numbers. No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string “eight” to anything else. Why must ‘self’ be used explicitly in method definitions and calls? =================================================================== The idea was borrowed from Modula-3. It turns out to be very useful, for a variety of reasons. First, it’s more obvious that you are using a method or instance attribute instead of a local variable. Reading "self.x" or "self.meth()" makes it absolutely clear that an instance variable or method is used even if you don’t know the class definition by heart. In C++, you can sort of tell by the lack of a local variable declaration (assuming globals are rare or easily recognizable) – but in Python, there are no local variable declarations, so you’d have to look up the class definition to be sure. Some C++ and Java coding standards call for instance attributes to have an "m_" prefix, so this explicitness is still useful in those languages, too. Second, it means that no special syntax is necessary if you want to explicitly reference or call the method from a particular class. In C++, if you want to use a method from a base class which is overridden in a derived class, you have to use the "::" operator – in Python you can write "baseclass.methodname(self, )". This is particularly useful for "__init__()" methods, and in general in cases where a derived class method wants to extend the base class method of the same name and thus has to call the base class method somehow. Finally, for instance variables it solves a syntactic problem with assignment: since local variables in Python are (by definition!) those variables to which a value is assigned in a function body (and that aren’t explicitly declared global), there has to be some way to tell the interpreter that an assignment was meant to assign to an instance variable instead of to a local variable, and it should preferably be syntactic (for efficiency reasons). C++ does this through declarations, but Python doesn’t have declarations and it would be a pity having to introduce them just for this purpose. Using the explicit "self.var" solves this nicely. Similarly, for using instance variables, having to write "self.var" means that references to unqualified names inside a method don’t have to search the instance’s directories. To put it another way, local variables and instance variables live in two different namespaces, and you need to tell Python which namespace to use. Why can’t I use an assignment in an expression? =============================================== Starting in Python 3.8, you can! Assignment expressions using the walrus operator ":=" assign a variable in an expression: while chunk := fp.read(200): print(chunk) See **PEP 572** for more information. Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))? ================================================================================================================ As Guido said: (a) For some operations, prefix notation just reads better than postfix – prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation. (b) When I read code that says len(x) I *know* that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method. -- https://mail.python.org/pipermail/python-3000/2006-November/004 643.html Why is join() a string method instead of a list or tuple method? ================================================================ Strings became much more like other standard types starting in Python 1.6, when methods were added which give the same functionality that has always been available using the functions of the string module. Most of these new methods have been widely accepted, but the one which appears to make some programmers feel uncomfortable is: ", ".join(['1', '2', '4', '8', '16']) which gives the result: "1, 2, 4, 8, 16" There are two common arguments against this usage. The first runs along the lines of: “It looks really ugly using a method of a string literal (string constant)”, to which the answer is that it might, but a string literal is just a fixed value. If the methods are to be allowed on names bound to strings there is no logical reason to make them unavailable on literals. The second objection is typically cast as: “I am really telling a sequence to join its members together with a string constant”. Sadly, you aren’t. For some reason there seems to be much less difficulty with having "split()" as a string method, since in that case it is easy to see that "1, 2, 4, 8, 16".split(", ") is an instruction to a string literal to return the substrings delimited by the given separator (or, by default, arbitrary runs of white space). "join()" is a string method because in using it you are telling the separator string to iterate over a sequence of strings and insert itself between adjacent elements. This method can be used with any argument which obeys the rules for sequence objects, including any new classes you might define yourself. Similar methods exist for bytes and bytearray objects. How fast are exceptions? ======================== A "try"/"except" block is extremely efficient if no exceptions are raised. Actually catching an exception is expensive. In versions of Python prior to 2.0 it was common to use this idiom: try: value = mydict[key] except KeyError: mydict[key] = getvalue(key) value = mydict[key] This only made sense when you expected the dict to have the key almost all the time. If that wasn’t the case, you coded it like this: if key in mydict: value = mydict[key] else: value = mydict[key] = getvalue(key) For this specific case, you could also use "value = dict.setdefault(key, getvalue(key))", but only if the "getvalue()" call is cheap enough because it is evaluated in all cases. Why isn’t there a switch or case statement in Python? ===================================================== In general, structured switch statements execute one block of code when an expression has a particular value or set of values. Since Python 3.10 one can easily match literal values, or constants within a namespace, with a "match ... case" statement. An older alternative is a sequence of "if... elif... elif... else". For cases where you need to choose from a very large number of possibilities, you can create a dictionary mapping case values to functions to call. For example: functions = {'a': function_1, 'b': function_2, 'c': self.method_1} func = functions[value] func() For calling methods on objects, you can simplify yet further by using the "getattr()" built-in to retrieve methods with a particular name: class MyVisitor: def visit_a(self): ... def dispatch(self, value): method_name = 'visit_' + str(value) method = getattr(self, method_name) method() It’s suggested that you use a prefix for the method names, such as "visit_" in this example. Without such a prefix, if values are coming from an untrusted source, an attacker would be able to call any method on your object. Imitating switch with fallthrough, as with C’s switch-case-default, is possible, much harder, and less needed. Can’t you emulate threads in the interpreter instead of relying on an OS-specific thread implementation? ======================================================================================================== Answer 1: Unfortunately, the interpreter pushes at least one C stack frame for each Python stack frame. Also, extensions can call back into Python at almost random moments. Therefore, a complete threads implementation requires thread support for C. Answer 2: Fortunately, there is Stackless Python, which has a completely redesigned interpreter loop that avoids the C stack. Why can’t lambda expressions contain statements? ================================================ Python lambda expressions cannot contain statements because Python’s syntactic framework can’t handle statements nested inside expressions. However, in Python, this is not a serious problem. Unlike lambda forms in other languages, where they add functionality, Python lambdas are only a shorthand notation if you’re too lazy to define a function. Functions are already first class objects in Python, and can be declared in a local scope. Therefore the only advantage of using a lambda instead of a locally defined function is that you don’t need to invent a name for the function – but that’s just a local variable to which the function object (which is exactly the same type of object that a lambda expression yields) is assigned! Can Python be compiled to machine code, C or some other language? ================================================================= Cython compiles a modified version of Python with optional annotations into C extensions. Nuitka is an up-and-coming compiler of Python into C++ code, aiming to support the full Python language. How does Python manage memory? ============================== The details of Python memory management depend on the implementation. The standard implementation of Python, *CPython*, uses reference counting to detect inaccessible objects, and another mechanism to collect reference cycles, periodically executing a cycle detection algorithm which looks for inaccessible cycles and deletes the objects involved. The "gc" module provides functions to perform a garbage collection, obtain debugging statistics, and tune the collector’s parameters. Other implementations (such as Jython or PyPy), however, can rely on a different mechanism such as a full-blown garbage collector. This difference can cause some subtle porting problems if your Python code depends on the behavior of the reference counting implementation. In some Python implementations, the following code (which is fine in CPython) will probably run out of file descriptors: for file in very_long_list_of_files: f = open(file) c = f.read(1) Indeed, using CPython’s reference counting and destructor scheme, each new assignment to "f" closes the previous file. With a traditional GC, however, those file objects will only get collected (and closed) at varying and possibly long intervals. If you want to write code that will work with any Python implementation, you should explicitly close the file or use the "with" statement; this will work regardless of memory management scheme: for file in very_long_list_of_files: with open(file) as f: c = f.read(1) Why doesn’t CPython use a more traditional garbage collection scheme? ===================================================================== For one thing, this is not a C standard feature and hence it’s not portable. (Yes, we know about the Boehm GC library. It has bits of assembler code for *most* common platforms, not for all of them, and although it is mostly transparent, it isn’t completely transparent; patches are required to get Python to work with it.) Traditional GC also becomes a problem when Python is embedded into other applications. While in a standalone Python it’s fine to replace the standard "malloc()" and "free()" with versions provided by the GC library, an application embedding Python may want to have its *own* substitute for "malloc()" and "free()", and may not want Python’s. Right now, CPython works with anything that implements "malloc()" and "free()" properly. Why isn’t all memory freed when CPython exits? ============================================== Objects referenced from the global namespaces of Python modules are not always deallocated when Python exits. This may happen if there are circular references. There are also certain bits of memory that are allocated by the C library that are impossible to free (e.g. a tool like Purify will complain about these). Python is, however, aggressive about cleaning up memory on exit and does try to destroy every single object. If you want to force Python to delete certain things on deallocation use the "atexit" module to run a function that will force those deletions. Why are there separate tuple and list data types? ================================================= Lists and tuples, while similar in many respects, are generally used in fundamentally different ways. Tuples can be thought of as being similar to Pascal "records" or C "structs"; they’re small collections of related data which may be of different types which are operated on as a group. For example, a Cartesian coordinate is appropriately represented as a tuple of two or three numbers. Lists, on the other hand, are more like arrays in other languages. They tend to hold a varying number of objects all of which have the same type and which are operated on one-by-one. For example, "os.listdir('.')" returns a list of strings representing the files in the current directory. Functions which operate on this output would generally not break if you added another file or two to the directory. Tuples are immutable, meaning that once a tuple has been created, you can’t replace any of its elements with a new value. Lists are mutable, meaning that you can always change a list’s elements. Only immutable elements can be used as dictionary keys, and hence only tuples and not lists can be used as keys. How are lists implemented in CPython? ===================================== CPython’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure. This makes indexing a list "a[i]" an operation whose cost is independent of the size of the list or the value of the index. When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize. How are dictionaries implemented in CPython? ============================================ CPython’s dictionaries are implemented as resizable hash tables. Compared to B-trees, this gives better performance for lookup (the most common operation by far) under most circumstances, and the implementation is simpler. Dictionaries work by computing a hash code for each key stored in the dictionary using the "hash()" built-in function. The hash code varies widely depending on the key and a per-process seed; for example, "'Python'" could hash to "-539294296" while "'python'", a string that differs by a single bit, could hash to "1142331976". The hash code is then used to calculate a location in an internal array where the value will be stored. Assuming that you’re storing keys that all have different hash values, this means that dictionaries take constant time – *O*(1), in Big-O notation – to retrieve a key. Why must dictionary keys be immutable? ====================================== The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different. If you want a dictionary indexed with a list, simply convert the list to a tuple first; the function "tuple(L)" creates a tuple with the same entries as the list "L". Tuples are immutable and can therefore be used as dictionary keys. Some unacceptable solutions that have been proposed: * Hash lists by their address (object ID). This doesn’t work because if you construct a new list with the same value it won’t be found; e.g.: mydict = {[1, 2]: '12'} print(mydict[[1, 2]]) would raise a "KeyError" exception because the id of the "[1, 2]" used in the second line differs from that in the first line. In other words, dictionary keys should be compared using "==", not using "is". * Make a copy when using a list as a key. This doesn’t work because the list, being a mutable object, could contain a reference to itself, and then the copying code would run into an infinite loop. * Allow lists as keys but tell the user not to modify them. This would allow a class of hard-to-track bugs in programs when you forgot or modified a list by accident. It also invalidates an important invariant of dictionaries: every value in "d.keys()" is usable as a key of the dictionary. * Mark lists as read-only once they are used as a dictionary key. The problem is that it’s not just the top-level object that could change its value; you could use a tuple containing a list as a key. Entering anything as a key into a dictionary would require marking all objects reachable from there as read-only – and again, self- referential objects could cause an infinite loop. There is a trick to get around this if you need to, but use it at your own risk: You can wrap a mutable structure inside a class instance which has both a "__eq__()" and a "__hash__()" method. You must then make sure that the hash value for all such wrapper objects that reside in a dictionary (or other hash based structure), remain fixed while the object is in the dictionary (or other structure). class ListWrapper: def __init__(self, the_list): self.the_list = the_list def __eq__(self, other): return self.the_list == other.the_list def __hash__(self): l = self.the_list result = 98767 - len(l)*555 for i, el in enumerate(l): try: result = result + (hash(el) % 9999999) * 1001 + i except Exception: result = (result % 7777777) + i * 333 return result Note that the hash computation is complicated by the possibility that some members of the list may be unhashable and also by the possibility of arithmetic overflow. Furthermore it must always be the case that if "o1 == o2" (ie "o1.__eq__(o2) is True") then "hash(o1) == hash(o2)" (ie, "o1.__hash__() == o2.__hash__()"), regardless of whether the object is in a dictionary or not. If you fail to meet these restrictions dictionaries and other hash based structures will misbehave. In the case of "ListWrapper", whenever the wrapper object is in a dictionary the wrapped list must not change to avoid anomalies. Don’t do this unless you are prepared to think hard about the requirements and the consequences of not meeting them correctly. Consider yourself warned. Why doesn’t list.sort() return the sorted list? =============================================== In situations where performance matters, making a copy of the list just to sort it would be wasteful. Therefore, "list.sort()" sorts the list in place. In order to remind you of that fact, it does not return the sorted list. This way, you won’t be fooled into accidentally overwriting a list when you need a sorted copy but also need to keep the unsorted version around. If you want to return a new list, use the built-in "sorted()" function instead. This function creates a new list from a provided iterable, sorts it and returns it. For example, here’s how to iterate over the keys of a dictionary in sorted order: for key in sorted(mydict): ... # do whatever with mydict[key]... How do you specify and enforce an interface spec in Python? =========================================================== An interface specification for a module as provided by languages such as C++ and Java describes the prototypes for the methods and functions of the module. Many feel that compile-time enforcement of interface specifications helps in the construction of large programs. Python 2.6 adds an "abc" module that lets you define Abstract Base Classes (ABCs). You can then use "isinstance()" and "issubclass()" to check whether an instance or a class implements a particular ABC. The "collections.abc" module defines a set of useful ABCs such as "Iterable", "Container", and "MutableMapping". For Python, many of the advantages of interface specifications can be obtained by an appropriate test discipline for components. A good test suite for a module can both provide a regression test and serve as a module interface specification and a set of examples. Many Python modules can be run as a script to provide a simple “self test.” Even modules which use complex external interfaces can often be tested in isolation using trivial “stub” emulations of the external interface. The "doctest" and "unittest" modules or third-party test frameworks can be used to construct exhaustive test suites that exercise every line of code in a module. An appropriate testing discipline can help build large complex applications in Python as well as having interface specifications would. In fact, it can be better because an interface specification cannot test certain properties of a program. For example, the "list.append()" method is expected to add new elements to the end of some internal list; an interface specification cannot test that your "list.append()" implementation will actually do this correctly, but it’s trivial to check this property in a test suite. Writing test suites is very helpful, and you might want to design your code to make it easily tested. One increasingly popular technique, test-driven development, calls for writing parts of the test suite first, before you write any of the actual code. Of course Python allows you to be sloppy and not write test cases at all. Why is there no goto? ===================== In the 1970s people realized that unrestricted goto could lead to messy “spaghetti” code that was hard to understand and revise. In a high-level language, it is also unneeded as long as there are ways to branch (in Python, with "if" statements and "or", "and", and "if"/"else" expressions) and loop (with "while" and "for" statements, possibly containing "continue" and "break"). One can also use exceptions to provide a “structured goto” that works even across function calls. Many feel that exceptions can conveniently emulate all reasonable uses of the "go" or "goto" constructs of C, Fortran, and other languages. For example: class label(Exception): pass # declare a label try: ... if condition: raise label() # goto label ... except label: # where to goto pass ... This doesn’t allow you to jump into the middle of a loop, but that’s usually considered an abuse of "goto" anyway. Use sparingly. Why can’t raw strings (r-strings) end with a backslash? ======================================================= More precisely, they can’t end with an odd number of backslashes: the unpaired backslash at the end escapes the closing quote character, leaving an unterminated string. Raw strings were designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing. Such processors consider an unmatched trailing backslash to be an error anyway, so raw strings disallow that. In return, they allow you to pass on the string quote character by escaping it with a backslash. These rules work well when r-strings are used for their intended purpose. If you’re trying to build Windows pathnames, note that all Windows system calls accept forward slashes too: f = open("/mydir/file.txt") # works fine! If you’re trying to build a pathname for a DOS command, try e.g. one of dir = r"\this\is\my\dos\dir" "\\" dir = r"\this\is\my\dos\dir\ "[:-1] dir = "\\this\\is\\my\\dos\\dir\\" Why doesn’t Python have a “with” statement for attribute assignments? ===================================================================== Python has a "with" statement that wraps the execution of a block, calling code on the entrance and exit from the block. Some languages have a construct that looks like this: with obj: a = 1 # equivalent to obj.a = 1 total = total + 1 # obj.total = obj.total + 1 In Python, such a construct would be ambiguous. Other languages, such as Object Pascal, Delphi, and C++, use static types, so it’s possible to know, in an unambiguous way, what member is being assigned to. This is the main point of static typing – the compiler *always* knows the scope of every variable at compile time. Python uses dynamic types. It is impossible to know in advance which attribute will be referenced at runtime. Member attributes may be added or removed from objects on the fly. This makes it impossible to know, from a simple reading, what attribute is being referenced: a local one, a global one, or a member attribute? For instance, take the following incomplete snippet: def foo(a): with a: print(x) The snippet assumes that "a" must have a member attribute called "x". However, there is nothing in Python that tells the interpreter this. What should happen if "a" is, let us say, an integer? If there is a global variable named "x", will it be used inside the "with" block? As you see, the dynamic nature of Python makes such choices much harder. The primary benefit of "with" and similar language features (reduction of code volume) can, however, easily be achieved in Python by assignment. Instead of: function(args).mydict[index][index].a = 21 function(args).mydict[index][index].b = 42 function(args).mydict[index][index].c = 63 write this: ref = function(args).mydict[index][index] ref.a = 21 ref.b = 42 ref.c = 63 This also has the side-effect of increasing execution speed because name bindings are resolved at run-time in Python, and the second version only needs to perform the resolution once. Similar proposals that would introduce syntax to further reduce code volume, such as using a ‘leading dot’, have been rejected in favour of explicitness (see https://mail.python.org/pipermail/python- ideas/2016-May/040070.html). Why don’t generators support the with statement? ================================================ For technical reasons, a generator used directly as a context manager would not work correctly. When, as is most common, a generator is used as an iterator run to completion, no closing is needed. When it is, wrap it as "contextlib.closing(generator)" in the "with" statement. Why are colons required for the if/while/def/class statements? ============================================================== The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this: if a == b print(a) versus if a == b: print(a) Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English. Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text. Why does Python allow commas at the end of lists and tuples? ============================================================ Python lets you add a trailing comma at the end of lists, tuples, and dictionaries: [1, 2, 3,] ('a', 'b', 'c',) d = { "A": [1, 5], "B": [6, 7], # last trailing comma is optional but good style } There are several reasons to allow this. When you have a literal value for a list, tuple, or dictionary spread across multiple lines, it’s easier to add more elements because you don’t have to remember to add a comma to the previous line. The lines can also be reordered without creating a syntax error. Accidentally omitting the comma can lead to errors that are hard to diagnose. For example: x = [ "fee", "fie" "foo", "fum" ] This list looks like it has four elements, but it actually contains three: “fee”, “fiefoo” and “fum”. Always adding the comma avoids this source of error. Allowing the trailing comma may also make programmatic code generation easier. Extending/Embedding FAQ *********************** Can I create my own functions in C? =================================== Yes, you can create built-in modules containing functions, variables, exceptions and even new types in C. This is explained in the document Extending and Embedding the Python Interpreter. Most intermediate or advanced Python books will also cover this topic. Can I create my own functions in C++? ===================================== Yes, using the C compatibility features found in C++. Place "extern "C" { ... }" around the Python include files and put "extern "C"" before each function that is going to be called by the Python interpreter. Global or static C++ objects with constructors are probably not a good idea. Writing C is hard; are there any alternatives? ============================================== There are a number of alternatives to writing your own C extensions, depending on what you’re trying to do. Recommended third party tools offer both simpler and more sophisticated approaches to creating C and C++ extensions for Python. How can I execute arbitrary Python statements from C? ===================================================== The highest-level function to do this is "PyRun_SimpleString()" which takes a single string argument to be executed in the context of the module "__main__" and returns "0" for success and "-1" when an exception occurred (including "SyntaxError"). If you want more control, use "PyRun_String()"; see the source for "PyRun_SimpleString()" in "Python/pythonrun.c". How can I evaluate an arbitrary Python expression from C? ========================================================= Call the function "PyRun_String()" from the previous question with the start symbol "Py_eval_input"; it parses an expression, evaluates it and returns its value. How do I extract C values from a Python object? =============================================== That depends on the object’s type. If it’s a tuple, "PyTuple_Size()" returns its length and "PyTuple_GetItem()" returns the item at a specified index. Lists have similar functions, "PyList_Size()" and "PyList_GetItem()". For bytes, "PyBytes_Size()" returns its length and "PyBytes_AsStringAndSize()" provides a pointer to its value and its length. Note that Python bytes objects may contain null bytes so C’s "strlen()" should not be used. To test the type of an object, first make sure it isn’t "NULL", and then use "PyBytes_Check()", "PyTuple_Check()", "PyList_Check()", etc. There is also a high-level API to Python objects which is provided by the so-called ‘abstract’ interface – read "Include/abstract.h" for further details. It allows interfacing with any kind of Python sequence using calls like "PySequence_Length()", "PySequence_GetItem()", etc. as well as many other useful protocols such as numbers ("PyNumber_Index()" et al.) and mappings in the PyMapping APIs. How do I use Py_BuildValue() to create a tuple of arbitrary length? =================================================================== You can’t. Use "PyTuple_Pack()" instead. How do I call an object’s method from C? ======================================== The "PyObject_CallMethod()" function can be used to call an arbitrary method of an object. The parameters are the object, the name of the method to call, a format string like that used with "Py_BuildValue()", and the argument values: PyObject * PyObject_CallMethod(PyObject *object, const char *method_name, const char *arg_format, ...); This works for any object that has methods – whether built-in or user- defined. You are responsible for eventually "Py_DECREF()"‘ing the return value. To call, e.g., a file object’s “seek” method with arguments 10, 0 (assuming the file object pointer is “f”): res = PyObject_CallMethod(f, "seek", "(ii)", 10, 0); if (res == NULL) { ... an exception occurred ... } else { Py_DECREF(res); } Note that since "PyObject_CallObject()" *always* wants a tuple for the argument list, to call a function without arguments, pass “()” for the format, and to call a function with one argument, surround the argument in parentheses, e.g. “(i)”. How do I catch the output from PyErr_Print() (or anything that prints to stdout/stderr)? ======================================================================================== In Python code, define an object that supports the "write()" method. Assign this object to "sys.stdout" and "sys.stderr". Call print_error, or just allow the standard traceback mechanism to work. Then, the output will go wherever your "write()" method sends it. The easiest way to do this is to use the "io.StringIO" class: >>> import io, sys >>> sys.stdout = io.StringIO() >>> print('foo') >>> print('hello world!') >>> sys.stderr.write(sys.stdout.getvalue()) foo hello world! A custom object to do the same would look like this: >>> import io, sys >>> class StdoutCatcher(io.TextIOBase): ... def __init__(self): ... self.data = [] ... def write(self, stuff): ... self.data.append(stuff) ... >>> import sys >>> sys.stdout = StdoutCatcher() >>> print('foo') >>> print('hello world!') >>> sys.stderr.write(''.join(sys.stdout.data)) foo hello world! How do I access a module written in Python from C? ================================================== You can get a pointer to the module object as follows: module = PyImport_ImportModule(""); If the module hasn’t been imported yet (i.e. it is not yet present in "sys.modules"), this initializes the module; otherwise it simply returns the value of "sys.modules[""]". Note that it doesn’t enter the module into any namespace – it only ensures it has been initialized and is stored in "sys.modules". You can then access the module’s attributes (i.e. any name defined in the module) as follows: attr = PyObject_GetAttrString(module, ""); Calling "PyObject_SetAttrString()" to assign to variables in the module also works. How do I interface to C++ objects from Python? ============================================== Depending on your requirements, there are many approaches. To do this manually, begin by reading the “Extending and Embedding” document. Realize that for the Python run-time system, there isn’t a whole lot of difference between C and C++ – so the strategy of building a new Python type around a C structure (pointer) type will also work for C++ objects. For C++ libraries, see Writing C is hard; are there any alternatives?. I added a module using the Setup file and the make fails; why? ============================================================== Setup must end in a newline, if there is no newline there, the build process fails. (Fixing this requires some ugly shell script hackery, and this bug is so minor that it doesn’t seem worth the effort.) How do I debug an extension? ============================ When using GDB with dynamically loaded extensions, you can’t set a breakpoint in your extension until your extension is loaded. In your ".gdbinit" file (or interactively), add the command: br _PyImport_LoadDynamicModule Then, when you run GDB: $ gdb /local/bin/python gdb) run myscript.py gdb) continue # repeat until your extension is loaded gdb) finish # so that your extension is loaded gdb) br myfunction.c:50 gdb) continue I want to compile a Python module on my Linux system, but some files are missing. Why? ====================================================================================== Most packaged versions of Python omit some files required for compiling Python extensions. For Red Hat, install the python3-devel RPM to get the necessary files. For Debian, run "apt-get install python3-dev". How do I tell “incomplete input” from “invalid input”? ====================================================== Sometimes you want to emulate the Python interactive interpreter’s behavior, where it gives you a continuation prompt when the input is incomplete (e.g. you typed the start of an “if” statement or you didn’t close your parentheses or triple string quotes), but it gives you a syntax error message immediately when the input is invalid. In Python you can use the "codeop" module, which approximates the parser’s behavior sufficiently. IDLE uses this, for example. The easiest way to do it in C is to call "PyRun_InteractiveLoop()" (perhaps in a separate thread) and let the Python interpreter handle the input for you. You can also set the "PyOS_ReadlineFunctionPointer()" to point at your custom input function. See "Modules/readline.c" and "Parser/myreadline.c" for more hints. How do I find undefined g++ symbols __builtin_new or __pure_virtual? ==================================================================== To dynamically load g++ extension modules, you must recompile Python, relink it using g++ (change LINKCC in the Python Modules Makefile), and link your extension module using g++ (e.g., "g++ -shared -o mymodule.so mymodule.o"). Can I create an object class with some methods implemented in C and others in Python (e.g. through inheritance)? ================================================================================================================ Yes, you can inherit from built-in classes such as "int", "list", "dict", etc. The Boost Python Library (BPL, https://www.boost.org/libs/python/doc/index.html) provides a way of doing this from C++ (i.e. you can inherit from an extension class written in C++ using the BPL). General Python FAQ ****************** General Information =================== What is Python? --------------- Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. It supports multiple programming paradigms beyond object-oriented programming, such as procedural and functional programming. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants including Linux and macOS, and on Windows. To find out more, start with The Python Tutorial. The Beginner’s Guide to Python links to other introductory tutorials and resources for learning Python. What is the Python Software Foundation? --------------------------------------- The Python Software Foundation is an independent non-profit organization that holds the copyright on Python versions 2.1 and newer. The PSF’s mission is to advance open source technology related to the Python programming language and to publicize the use of Python. The PSF’s home page is at https://www.python.org/psf/. Donations to the PSF are tax-exempt in the US. If you use Python and find it helpful, please contribute via the PSF donation page. Are there copyright restrictions on the use of Python? ------------------------------------------------------ You can do anything you want with the source, as long as you leave the copyrights in and display those copyrights in any documentation about Python that you produce. If you honor the copyright rules, it’s OK to use Python for commercial use, to sell copies of Python in source or binary form (modified or unmodified), or to sell products that incorporate Python in some form. We would still like to know about all commercial use of Python, of course. See the license page to find further explanations and the full text of the PSF License. The Python logo is trademarked, and in certain cases permission is required to use it. Consult the Trademark Usage Policy for more information. Why was Python created in the first place? ------------------------------------------ Here’s a *very* brief summary of what started it all, written by Guido van Rossum: I had extensive experience with implementing an interpreted language in the ABC group at CWI, and from working with this group I had learned a lot about language design. This is the origin of many Python features, including the use of indentation for statement grouping and the inclusion of very-high-level data types (although the details are all different in Python). I had a number of gripes about the ABC language, but also liked many of its features. It was impossible to extend the ABC language (or its implementation) to remedy my complaints – in fact its lack of extensibility was one of its biggest problems. I had some experience with using Modula-2+ and talked with the designers of Modula-3 and read the Modula-3 report. Modula-3 is the origin of the syntax and semantics used for exceptions, and some other Python features. I was working in the Amoeba distributed operating system group at CWI. We needed a better way to do system administration than by writing either C programs or Bourne shell scripts, since Amoeba had its own system call interface which wasn’t easily accessible from the Bourne shell. My experience with error handling in Amoeba made me acutely aware of the importance of exceptions as a programming language feature. It occurred to me that a scripting language with a syntax like ABC but with access to the Amoeba system calls would fill the need. I realized that it would be foolish to write an Amoeba-specific language, so I decided that I needed a language that was generally extensible. During the 1989 Christmas holidays, I had a lot of time on my hand, so I decided to give it a try. During the next year, while still mostly working on it in my own time, Python was used in the Amoeba project with increasing success, and the feedback from colleagues made me add many early improvements. In February 1991, after just over a year of development, I decided to post to USENET. The rest is in the "Misc/HISTORY" file. What is Python good for? ------------------------ Python is a high-level general-purpose programming language that can be applied to many different classes of problems. The language comes with a large standard library that covers areas such as string processing (regular expressions, Unicode, calculating differences between files), internet protocols (HTTP, FTP, SMTP, XML- RPC, POP, IMAP), software engineering (unit testing, logging, profiling, parsing Python code), and operating system interfaces (system calls, filesystems, TCP/IP sockets). Look at the table of contents for The Python Standard Library to get an idea of what’s available. A wide variety of third-party extensions are also available. Consult the Python Package Index to find packages of interest to you. How does the Python version numbering scheme work? -------------------------------------------------- Python versions are numbered “A.B.C” or “A.B”: * *A* is the major version number – it is only incremented for really major changes in the language. * *B* is the minor version number – it is incremented for less earth- shattering changes. * *C* is the micro version number – it is incremented for each bugfix release. Not all releases are bugfix releases. In the run-up to a new feature release, a series of development releases are made, denoted as alpha, beta, or release candidate. Alphas are early releases in which interfaces aren’t yet finalized; it’s not unexpected to see an interface change between two alpha releases. Betas are more stable, preserving existing interfaces but possibly adding new modules, and release candidates are frozen, making no changes except as needed to fix critical bugs. Alpha, beta and release candidate versions have an additional suffix: * The suffix for an alpha version is “aN” for some small number *N*. * The suffix for a beta version is “bN” for some small number *N*. * The suffix for a release candidate version is “rcN” for some small number *N*. In other words, all versions labeled *2.0aN* precede the versions labeled *2.0bN*, which precede versions labeled *2.0rcN*, and *those* precede 2.0. You may also find version numbers with a “+” suffix, e.g. “2.2+”. These are unreleased versions, built directly from the CPython development repository. In practice, after a final minor release is made, the version is incremented to the next minor version, which becomes the “a0” version, e.g. “2.4a0”. See the Developer’s Guide for more information about the development cycle, and **PEP 387** to learn more about Python’s backward compatibility policy. See also the documentation for "sys.version", "sys.hexversion", and "sys.version_info". How do I obtain a copy of the Python source? -------------------------------------------- The latest Python source distribution is always available from python.org, at https://www.python.org/downloads/. The latest development sources can be obtained at https://github.com/python/cpython/. The source distribution is a gzipped tar file containing the complete C source, Sphinx-formatted documentation, Python library modules, example programs, and several useful pieces of freely distributable software. The source will compile and run out of the box on most UNIX platforms. Consult the Getting Started section of the Python Developer’s Guide for more information on getting the source code and compiling it. How do I get documentation on Python? ------------------------------------- The standard documentation for the current stable version of Python is available at https://docs.python.org/3/. PDF, plain text, and downloadable HTML versions are also available at https://docs.python.org/3/download.html. The documentation is written in reStructuredText and processed by the Sphinx documentation tool. The reStructuredText source for the documentation is part of the Python source distribution. I’ve never programmed before. Is there a Python tutorial? --------------------------------------------------------- There are numerous tutorials and books available. The standard documentation includes The Python Tutorial. Consult the Beginner’s Guide to find information for beginning Python programmers, including lists of tutorials. Is there a newsgroup or mailing list devoted to Python? ------------------------------------------------------- There is a newsgroup, *comp.lang.python*, and a mailing list, python- list. The newsgroup and mailing list are gatewayed into each other – if you can read news it’s unnecessary to subscribe to the mailing list. *comp.lang.python* is high-traffic, receiving hundreds of postings every day, and Usenet readers are often more able to cope with this volume. Announcements of new software releases and events can be found in comp.lang.python.announce, a low-traffic moderated list that receives about five postings per day. It’s available as the python-announce mailing list. More info about other mailing lists and newsgroups can be found at https://www.python.org/community/lists/. How do I get a beta test version of Python? ------------------------------------------- Alpha and beta releases are available from https://www.python.org/downloads/. All releases are announced on the comp.lang.python and comp.lang.python.announce newsgroups and on the Python home page at https://www.python.org/; an RSS feed of news is available. You can also access the development version of Python through Git. See The Python Developer’s Guide for details. How do I submit bug reports and patches for Python? --------------------------------------------------- To report a bug or submit a patch, use the issue tracker at https://github.com/python/cpython/issues. For more information on how Python is developed, consult the Python Developer’s Guide. Are there any published articles about Python that I can reference? ------------------------------------------------------------------- It’s probably best to cite your favorite book about Python. The very first article about Python was written in 1991 and is now quite outdated. Guido van Rossum and Jelke de Boer, “Interactively Testing Remote Servers Using the Python Programming Language”, CWI Quarterly, Volume 4, Issue 4 (December 1991), Amsterdam, pp 283–303. Are there any books on Python? ------------------------------ Yes, there are many, and more are being published. See the python.org wiki at https://wiki.python.org/moin/PythonBooks for a list. You can also search online bookstores for “Python” and filter out the Monty Python references; or perhaps search for “Python” and “language”. Where in the world is www.python.org located? --------------------------------------------- The Python project’s infrastructure is located all over the world and is managed by the Python Infrastructure Team. Details here. Why is it called Python? ------------------------ When he began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van Rossum thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python. Do I have to like “Monty Python’s Flying Circus”? ------------------------------------------------- No, but it helps. :) Python in the real world ======================== How stable is Python? --------------------- Very stable. New, stable releases have been coming out roughly every 6 to 18 months since 1991, and this seems likely to continue. As of version 3.9, Python will have a new feature release every 12 months (**PEP 602**). The developers issue bugfix releases of older versions, so the stability of existing releases gradually improves. Bugfix releases, indicated by a third component of the version number (e.g. 3.5.3, 3.6.2), are managed for stability; only fixes for known problems are included in a bugfix release, and it’s guaranteed that interfaces will remain the same throughout a series of bugfix releases. The latest stable releases can always be found on the Python download page. Python 3.x is the recommended version and supported by most widely used libraries. Python 2.x **is not maintained anymore**. How many people are using Python? --------------------------------- There are probably millions of users, though it’s difficult to obtain an exact count. Python is available for free download, so there are no sales figures, and it’s available from many different sites and packaged with many Linux distributions, so download statistics don’t tell the whole story either. The comp.lang.python newsgroup is very active, but not all Python users post to the group or even read it. Have any significant projects been done in Python? -------------------------------------------------- See https://www.python.org/about/success for a list of projects that use Python. Consulting the proceedings for past Python conferences will reveal contributions from many different companies and organizations. High-profile Python projects include the Mailman mailing list manager and the Zope application server. Several Linux distributions, most notably Red Hat, have written part or all of their installer and system administration software in Python. Companies that use Python internally include Google, Yahoo, and Lucasfilm Ltd. What new developments are expected for Python in the future? ------------------------------------------------------------ See https://peps.python.org/ for the Python Enhancement Proposals (PEPs). PEPs are design documents describing a suggested new feature for Python, providing a concise technical specification and a rationale. Look for a PEP titled “Python X.Y Release Schedule”, where X.Y is a version that hasn’t been publicly released yet. New development is discussed on the python-dev mailing list. Is it reasonable to propose incompatible changes to Python? ----------------------------------------------------------- In general, no. There are already millions of lines of Python code around the world, so any change in the language that invalidates more than a very small fraction of existing programs has to be frowned upon. Even if you can provide a conversion program, there’s still the problem of updating all documentation; many books have been written about Python, and we don’t want to invalidate them all at a single stroke. Providing a gradual upgrade path is necessary if a feature has to be changed. **PEP 5** describes the procedure followed for introducing backward-incompatible changes while minimizing disruption for users. Is Python a good language for beginning programmers? ---------------------------------------------------- Yes. It is still common to start students with a procedural and statically typed language such as Pascal, C, or a subset of C++ or Java. Students may be better served by learning Python as their first language. Python has a very simple and consistent syntax and a large standard library and, most importantly, using Python in a beginning programming course lets students concentrate on important programming skills such as problem decomposition and data type design. With Python, students can be quickly introduced to basic concepts such as loops and procedures. They can probably even work with user-defined objects in their very first course. For a student who has never programmed before, using a statically typed language seems unnatural. It presents additional complexity that the student must master and slows the pace of the course. The students are trying to learn to think like a computer, decompose problems, design consistent interfaces, and encapsulate data. While learning to use a statically typed language is important in the long term, it is not necessarily the best topic to address in the students’ first programming course. Many other aspects of Python make it a good first language. Like Java, Python has a large standard library so that students can be assigned programming projects very early in the course that *do* something. Assignments aren’t restricted to the standard four- function calculator and check balancing programs. By using the standard library, students can gain the satisfaction of working on realistic applications as they learn the fundamentals of programming. Using the standard library also teaches students about code reuse. Third-party modules such as PyGame are also helpful in extending the students’ reach. Python’s interactive interpreter enables students to test language features while they’re programming. They can keep a window with the interpreter running while they enter their program’s source in another window. If they can’t remember the methods for a list, they can do something like this: >>> L = [] >>> dir(L) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>> [d for d in dir(L) if '__' not in d] ['append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>> help(L.append) Help on built-in function append: append(...) L.append(object) -> None -- append object to end >>> L.append(1) >>> L [1] With the interpreter, documentation is never far from the student as they are programming. There are also good IDEs for Python. IDLE is a cross-platform IDE for Python that is written in Python using Tkinter. Emacs users will be happy to know that there is a very good Python mode for Emacs. All of these programming environments provide syntax highlighting, auto- indenting, and access to the interactive interpreter while coding. Consult the Python wiki for a full list of Python editing environments. If you want to discuss Python’s use in education, you may be interested in joining the edu-sig mailing list. Graphic User Interface FAQ ************************** General GUI Questions ===================== What GUI toolkits exist for Python? =================================== Standard builds of Python include an object-oriented interface to the Tcl/Tk widget set, called tkinter. This is probably the easiest to install (since it comes included with most binary distributions of Python) and use. For more info about Tk, including pointers to the source, see the Tcl/Tk home page. Tcl/Tk is fully portable to the macOS, Windows, and Unix platforms. Depending on what platform(s) you are aiming at, there are also several alternatives. A list of cross-platform and platform-specific GUI frameworks can be found on the python wiki. Tkinter questions ================= How do I freeze Tkinter applications? ------------------------------------- Freeze is a tool to create stand-alone applications. When freezing Tkinter applications, the applications will not be truly stand-alone, as the application will still need the Tcl and Tk libraries. One solution is to ship the application with the Tcl and Tk libraries, and point to them at run-time using the "TCL_LIBRARY" and "TK_LIBRARY" environment variables. Various third-party freeze libraries such as py2exe and cx_Freeze have handling for Tkinter applications built-in. Can I have Tk events handled while waiting for I/O? --------------------------------------------------- On platforms other than Windows, yes, and you don’t even need threads! But you’ll have to restructure your I/O code a bit. Tk has the equivalent of Xt’s "XtAddInput()" call, which allows you to register a callback function which will be called from the Tk mainloop when I/O is possible on a file descriptor. See File Handlers. I can’t get key bindings to work in Tkinter: why? ------------------------------------------------- An often-heard complaint is that event handlers bound to events with the "bind()" method don’t get handled even when the appropriate key is pressed. The most common cause is that the widget to which the binding applies doesn’t have “keyboard focus”. Check out the Tk documentation for the focus command. Usually a widget is given the keyboard focus by clicking in it (but not for labels; see the takefocus option). Python Frequently Asked Questions ********************************* * General Python FAQ * Programming FAQ * Design and History FAQ * Library and Extension FAQ * Extending/Embedding FAQ * Python on Windows FAQ * Graphic User Interface FAQ * “Why is Python Installed on my Computer?” FAQ “Why is Python Installed on my Computer?” FAQ ********************************************* What is Python? =============== Python is a programming language. It’s used for many different applications. It’s used in some high schools and colleges as an introductory programming language because Python is easy to learn, but it’s also used by professional software developers at places such as Google, NASA, and Lucasfilm Ltd. If you wish to learn more about Python, start with the Beginner’s Guide to Python. Why is Python installed on my machine? ====================================== If you find Python installed on your system but don’t remember installing it, there are several possible ways it could have gotten there. * Perhaps another user on the computer wanted to learn programming and installed it; you’ll have to figure out who’s been using the machine and might have installed it. * A third-party application installed on the machine might have been written in Python and included a Python installation. There are many such applications, from GUI programs to network servers and administrative scripts. * Some Windows machines also have Python installed. At this writing we’re aware of computers from Hewlett-Packard and Compaq that include Python. Apparently some of HP/Compaq’s administrative tools are written in Python. * Many Unix-compatible operating systems, such as macOS and some Linux distributions, have Python installed by default; it’s included in the base installation. Can I delete Python? ==================== That depends on where Python came from. If someone installed it deliberately, you can remove it without hurting anything. On Windows, use the Add/Remove Programs icon in the Control Panel. If Python was installed by a third-party application, you can also remove it, but that application will no longer work. You should use that application’s uninstaller rather than removing Python directly. If Python came with your operating system, removing it is not recommended. If you remove it, whatever tools were written in Python will no longer run, and some of them might be important to you. Reinstalling the whole system would then be required to fix things again. Library and Extension FAQ ************************* General Library Questions ========================= How do I find a module or application to perform task X? -------------------------------------------------------- Check the Library Reference to see if there’s a relevant standard library module. (Eventually you’ll learn what’s in the standard library and will be able to skip this step.) For third-party packages, search the Python Package Index or try Google or another web search engine. Searching for “Python” plus a keyword or two for your topic of interest will usually find something helpful. Where is the math.py (socket.py, regex.py, etc.) source file? ------------------------------------------------------------- If you can’t find a source file for a module it may be a built-in or dynamically loaded module implemented in C, C++ or other compiled language. In this case you may not have the source file or it may be something like "mathmodule.c", somewhere in a C source directory (not on the Python Path). There are (at least) three kinds of modules in Python: 1. modules written in Python (.py); 2. modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc); 3. modules written in C and linked with the interpreter; to get a list of these, type: import sys print(sys.builtin_module_names) How do I make a Python script executable on Unix? ------------------------------------------------- You need to do two things: the script file’s mode must be executable and the first line must begin with "#!" followed by the path of the Python interpreter. The first is done by executing "chmod +x scriptfile" or perhaps "chmod 755 scriptfile". The second can be done in a number of ways. The most straightforward way is to write #!/usr/local/bin/python as the very first line of your file, using the pathname for where the Python interpreter is installed on your platform. If you would like the script to be independent of where the Python interpreter lives, you can use the **env** program. Almost all Unix variants support the following, assuming the Python interpreter is in a directory on the user’s "PATH": #!/usr/bin/env python *Don’t* do this for CGI scripts. The "PATH" variable for CGI scripts is often very minimal, so you need to use the actual absolute pathname of the interpreter. Occasionally, a user’s environment is so full that the **/usr/bin/env** program fails; or there’s no env program at all. In that case, you can try the following hack (due to Alex Rezinsky): #! /bin/sh """:" exec python $0 ${1+"$@"} """ The minor disadvantage is that this defines the script’s __doc__ string. However, you can fix that by adding __doc__ = """...Whatever...""" Is there a curses/termcap package for Python? --------------------------------------------- For Unix variants: The standard Python source distribution comes with a curses module in the Modules subdirectory, though it’s not compiled by default. (Note that this is not available in the Windows distribution – there is no curses module for Windows.) The "curses" module supports basic curses features as well as many additional functions from ncurses and SYSV curses such as colour, alternative character set support, pads, and mouse support. This means the module isn’t compatible with operating systems that only have BSD curses, but there don’t seem to be any currently maintained OSes that fall into this category. Is there an equivalent to C’s onexit() in Python? ------------------------------------------------- The "atexit" module provides a register function that is similar to C’s "onexit()". Why don’t my signal handlers work? ---------------------------------- The most common problem is that the signal handler is declared with the wrong argument list. It is called as handler(signum, frame) so it should be declared with two parameters: def handler(signum, frame): ... Common tasks ============ How do I test a Python program or component? -------------------------------------------- Python comes with two testing frameworks. The "doctest" module finds examples in the docstrings for a module and runs them, comparing the output with the expected output given in the docstring. The "unittest" module is a fancier testing framework modelled on Java and Smalltalk testing frameworks. To make testing easier, you should use good modular design in your program. Your program should have almost all functionality encapsulated in either functions or class methods – and this sometimes has the surprising and delightful effect of making the program run faster (because local variable accesses are faster than global accesses). Furthermore the program should avoid depending on mutating global variables, since this makes testing much more difficult to do. The “global main logic” of your program may be as simple as if __name__ == "__main__": main_logic() at the bottom of the main module of your program. Once your program is organized as a tractable collection of function and class behaviours, you should write test functions that exercise the behaviours. A test suite that automates a sequence of tests can be associated with each module. This sounds like a lot of work, but since Python is so terse and flexible it’s surprisingly easy. You can make coding much more pleasant and fun by writing your test functions in parallel with the “production code”, since this makes it easy to find bugs and even design flaws earlier. “Support modules” that are not intended to be the main module of a program may include a self-test of the module. if __name__ == "__main__": self_test() Even programs that interact with complex external interfaces may be tested when the external interfaces are unavailable by using “fake” interfaces implemented in Python. How do I create documentation from doc strings? ----------------------------------------------- The "pydoc" module can create HTML from the doc strings in your Python source code. An alternative for creating API documentation purely from docstrings is epydoc. Sphinx can also include docstring content. How do I get a single keypress at a time? ----------------------------------------- For Unix variants there are several solutions. It’s straightforward to do this using curses, but curses is a fairly large module to learn. Threads ======= How do I program using threads? ------------------------------- Be sure to use the "threading" module and not the "_thread" module. The "threading" module builds convenient abstractions on top of the low-level primitives provided by the "_thread" module. None of my threads seem to run: why? ------------------------------------ As soon as the main thread exits, all threads are killed. Your main thread is running too quickly, giving the threads no time to do any work. A simple fix is to add a sleep to the end of the program that’s long enough for all the threads to finish: import threading, time def thread_task(name, n): for i in range(n): print(name, i) for i in range(10): T = threading.Thread(target=thread_task, args=(str(i), i)) T.start() time.sleep(10) # <---------------------------! But now (on many platforms) the threads don’t run in parallel, but appear to run sequentially, one at a time! The reason is that the OS thread scheduler doesn’t start a new thread until the previous thread is blocked. A simple fix is to add a tiny sleep to the start of the run function: def thread_task(name, n): time.sleep(0.001) # <--------------------! for i in range(n): print(name, i) for i in range(10): T = threading.Thread(target=thread_task, args=(str(i), i)) T.start() time.sleep(10) Instead of trying to guess a good delay value for "time.sleep()", it’s better to use some kind of semaphore mechanism. One idea is to use the "queue" module to create a queue object, let each thread append a token to the queue when it finishes, and let the main thread read as many tokens from the queue as there are threads. How do I parcel out work among a bunch of worker threads? --------------------------------------------------------- The easiest way is to use the "concurrent.futures" module, especially the "ThreadPoolExecutor" class. Or, if you want fine control over the dispatching algorithm, you can write your own logic manually. Use the "queue" module to create a queue containing a list of jobs. The "Queue" class maintains a list of objects and has a ".put(obj)" method that adds items to the queue and a ".get()" method to return them. The class will take care of the locking necessary to ensure that each job is handed out exactly once. Here’s a trivial example: import threading, queue, time # The worker thread gets jobs off the queue. When the queue is empty, it # assumes there will be no more work and exits. # (Realistically workers will run until terminated.) def worker(): print('Running worker') time.sleep(0.1) while True: try: arg = q.get(block=False) except queue.Empty: print('Worker', threading.current_thread(), end=' ') print('queue empty') break else: print('Worker', threading.current_thread(), end=' ') print('running with argument', arg) time.sleep(0.5) # Create queue q = queue.Queue() # Start a pool of 5 workers for i in range(5): t = threading.Thread(target=worker, name='worker %i' % (i+1)) t.start() # Begin adding work to the queue for i in range(50): q.put(i) # Give threads time to run print('Main thread sleeping') time.sleep(5) When run, this will produce the following output: Running worker Running worker Running worker Running worker Running worker Main thread sleeping Worker running with argument 0 Worker running with argument 1 Worker running with argument 2 Worker running with argument 3 Worker running with argument 4 Worker running with argument 5 ... Consult the module’s documentation for more details; the "Queue" class provides a featureful interface. What kinds of global value mutation are thread-safe? ---------------------------------------------------- A *global interpreter lock* (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via "sys.setswitchinterval()". Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program. In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that “look atomic” really are. For example, the following operations are all atomic (L, L1, L2 are lists, D, D1, D2 are dicts, x, y are objects, i, j are ints): L.append(x) L1.extend(L2) x = L[i] x = L.pop() L1[i:j] = L2 L.sort() x = y x.field = y D[x] = y D1.update(D2) D.keys() These aren’t: i = i+1 L.append(L[-1]) L[i] = L[j] D[x] = D[x] + 1 Operations that replace other objects may invoke those other objects’ "__del__()" method when their reference count reaches zero, and that can affect things. This is especially true for the mass updates to dictionaries and lists. When in doubt, use a mutex! Can’t we get rid of the Global Interpreter Lock? ------------------------------------------------ The *global interpreter lock* (GIL) is often seen as a hindrance to Python’s deployment on high-end multiprocessor server machines, because a multi-threaded Python program effectively only uses one CPU, due to the insistence that (almost) all Python code can only run while the GIL is held. With the approval of **PEP 703** work is now underway to remove the GIL from the CPython implementation of Python. Initially it will be implemented as an optional compiler flag when building the interpreter, and so separate builds will be available with and without the GIL. Long-term, the hope is to settle on a single build, once the performance implications of removing the GIL are fully understood. Python 3.13 is likely to be the first release containing this work, although it may not be completely functional in this release. The current work to remove the GIL is based on a fork of Python 3.9 with the GIL removed by Sam Gross. Prior to that, in the days of Python 1.5, Greg Stein actually implemented a comprehensive patch set (the “free threading” patches) that removed the GIL and replaced it with fine-grained locking. Adam Olsen did a similar experiment in his python-safethread project. Unfortunately, both of these earlier experiments exhibited a sharp drop in single-thread performance (at least 30% slower), due to the amount of fine-grained locking necessary to compensate for the removal of the GIL. The Python 3.9 fork is the first attempt at removing the GIL with an acceptable performance impact. The presence of the GIL in current Python releases doesn’t mean that you can’t make good use of Python on multi-CPU machines! You just have to be creative with dividing the work up between multiple *processes* rather than multiple *threads*. The "ProcessPoolExecutor" class in the new "concurrent.futures" module provides an easy way of doing so; the "multiprocessing" module provides a lower-level API in case you want more control over dispatching of tasks. Judicious use of C extensions will also help; if you use a C extension to perform a time-consuming task, the extension can release the GIL while the thread of execution is in the C code and allow other threads to get some work done. Some standard library modules such as "zlib" and "hashlib" already do this. An alternative approach to reducing the impact of the GIL is to make the GIL a per-interpreter-state lock rather than truly global. This was first implemented in Python 3.12 and is available in the C API. A Python interface to it is expected in Python 3.13. The main limitation to it at the moment is likely to be 3rd party extension modules, since these must be written with multiple interpreters in mind in order to be usable, so many older extension modules will not be usable. Input and Output ================ How do I delete a file? (And other file questions…) --------------------------------------------------- Use "os.remove(filename)" or "os.unlink(filename)"; for documentation, see the "os" module. The two functions are identical; "unlink()" is simply the name of the Unix system call for this function. To remove a directory, use "os.rmdir()"; use "os.mkdir()" to create one. "os.makedirs(path)" will create any intermediate directories in "path" that don’t exist. "os.removedirs(path)" will remove intermediate directories as long as they’re empty; if you want to delete an entire directory tree and its contents, use "shutil.rmtree()". To rename a file, use "os.rename(old_path, new_path)". To truncate a file, open it using "f = open(filename, "rb+")", and use "f.truncate(offset)"; offset defaults to the current seek position. There’s also "os.ftruncate(fd, offset)" for files opened with "os.open()", where *fd* is the file descriptor (a small integer). The "shutil" module also contains a number of functions to work on files including "copyfile()", "copytree()", and "rmtree()". How do I copy a file? --------------------- The "shutil" module contains a "copyfile()" function. Note that on Windows NTFS volumes, it does not copy alternate data streams nor resource forks on macOS HFS+ volumes, though both are now rarely used. It also doesn’t copy file permissions and metadata, though using "shutil.copy2()" instead will preserve most (though not all) of it. How do I read (or write) binary data? ------------------------------------- To read or write complex binary data formats, it’s best to use the "struct" module. It allows you to take a string containing binary data (usually numbers) and convert it to Python objects; and vice versa. For example, the following code reads two 2-byte integers and one 4-byte integer in big-endian format from a file: import struct with open(filename, "rb") as f: s = f.read(8) x, y, z = struct.unpack(">hhl", s) The ‘>’ in the format string forces big-endian data; the letter ‘h’ reads one “short integer” (2 bytes), and ‘l’ reads one “long integer” (4 bytes) from the string. For data that is more regular (e.g. a homogeneous list of ints or floats), you can also use the "array" module. Note: To read and write binary data, it is mandatory to open the file in binary mode (here, passing ""rb"" to "open()"). If you use ""r"" instead (the default), the file will be open in text mode and "f.read()" will return "str" objects rather than "bytes" objects. I can’t seem to use os.read() on a pipe created with os.popen(); why? --------------------------------------------------------------------- "os.read()" is a low-level function which takes a file descriptor, a small integer representing the opened file. "os.popen()" creates a high-level file object, the same type returned by the built-in "open()" function. Thus, to read *n* bytes from a pipe *p* created with "os.popen()", you need to use "p.read(n)". How do I access the serial (RS232) port? ---------------------------------------- For Win32, OSX, Linux, BSD, Jython, IronPython: pyserial For Unix, see a Usenet post by Mitch Chapman: https://groups.google.com/groups?selm=34A04430.CF9@ohioee.com Why doesn’t closing sys.stdout (stdin, stderr) really close it? --------------------------------------------------------------- Python *file objects* are a high-level layer of abstraction on low- level C file descriptors. For most file objects you create in Python via the built-in "open()" function, "f.close()" marks the Python file object as being closed from Python’s point of view, and also arranges to close the underlying C file descriptor. This also happens automatically in "f"’s destructor, when "f" becomes garbage. But stdin, stdout and stderr are treated specially by Python, because of the special status also given to them by C. Running "sys.stdout.close()" marks the Python-level file object as being closed, but does *not* close the associated C file descriptor. To close the underlying C file descriptor for one of these three, you should first be sure that’s what you really want to do (e.g., you may confuse extension modules trying to do I/O). If it is, use "os.close()": os.close(stdin.fileno()) os.close(stdout.fileno()) os.close(stderr.fileno()) Or you can use the numeric constants 0, 1 and 2, respectively. Network/Internet Programming ============================ What WWW tools are there for Python? ------------------------------------ See the chapters titled Internet Protocols and Support and Internet Data Handling in the Library Reference Manual. Python has many modules that will help you build server-side and client-side web systems. A summary of available frameworks is maintained by Paul Boddie at https://wiki.python.org/moin/WebProgramming. What module should I use to help with generating HTML? ------------------------------------------------------ You can find a collection of useful links on the Web Programming wiki page. How do I send mail from a Python script? ---------------------------------------- Use the standard library module "smtplib". Here’s a very simple interactive mail sender that uses it. This method will work on any host that supports an SMTP listener. import sys, smtplib fromaddr = input("From: ") toaddrs = input("To: ").split(',') print("Enter message, end with ^D:") msg = '' while True: line = sys.stdin.readline() if not line: break msg += line # The actual mail send server = smtplib.SMTP('localhost') server.sendmail(fromaddr, toaddrs, msg) server.quit() A Unix-only alternative uses sendmail. The location of the sendmail program varies between systems; sometimes it is "/usr/lib/sendmail", sometimes "/usr/sbin/sendmail". The sendmail manual page will help you out. Here’s some sample code: import os SENDMAIL = "/usr/sbin/sendmail" # sendmail location p = os.popen("%s -t -i" % SENDMAIL, "w") p.write("To: receiver@example.com\n") p.write("Subject: test\n") p.write("\n") # blank line separating headers from body p.write("Some text\n") p.write("some more text\n") sts = p.close() if sts != 0: print("Sendmail exit status", sts) How do I avoid blocking in the connect() method of a socket? ------------------------------------------------------------ The "select" module is commonly used to help with asynchronous I/O on sockets. To prevent the TCP connect from blocking, you can set the socket to non-blocking mode. Then when you do the "connect()", you will either connect immediately (unlikely) or get an exception that contains the error number as ".errno". "errno.EINPROGRESS" indicates that the connection is in progress, but hasn’t finished yet. Different OSes will return different values, so you’re going to have to check what’s returned on your system. You can use the "connect_ex()" method to avoid creating an exception. It will just return the errno value. To poll, you can call "connect_ex()" again later – "0" or "errno.EISCONN" indicate that you’re connected – or you can pass this socket to "select.select()" to check if it’s writable. Note: The "asyncio" module provides a general purpose single-threaded and concurrent asynchronous library, which can be used for writing non- blocking network code. The third-party Twisted library is a popular and feature-rich alternative. Databases ========= Are there any interfaces to database packages in Python? -------------------------------------------------------- Yes. Interfaces to disk-based hashes such as "DBM" and "GDBM" are also included with standard Python. There is also the "sqlite3" module, which provides a lightweight disk-based relational database. Support for most relational databases is available. See the DatabaseProgramming wiki page for details. How do you implement persistent objects in Python? -------------------------------------------------- The "pickle" library module solves this in a very general way (though you still can’t store things like open files, sockets or windows), and the "shelve" library module uses pickle and (g)dbm to create persistent mappings containing arbitrary Python objects. Mathematics and Numerics ======================== How do I generate random numbers in Python? ------------------------------------------- The standard module "random" implements a random number generator. Usage is simple: import random random.random() This returns a random floating-point number in the range [0, 1). There are also many other specialized generators in this module, such as: * "randrange(a, b)" chooses an integer in the range [a, b). * "uniform(a, b)" chooses a floating-point number in the range [a, b). * "normalvariate(mean, sdev)" samples the normal (Gaussian) distribution. Some higher-level functions operate on sequences directly, such as: * "choice(S)" chooses a random element from a given sequence. * "shuffle(L)" shuffles a list in-place, i.e. permutes it randomly. There’s also a "Random" class you can instantiate to create independent multiple random number generators. Programming FAQ *************** General Questions ================= Is there a source code level debugger with breakpoints, single-stepping, etc.? ------------------------------------------------------------------------------ Yes. Several debuggers for Python are described below, and the built-in function "breakpoint()" allows you to drop into any of them. The pdb module is a simple but adequate console-mode debugger for Python. It is part of the standard Python library, and is "documented in the Library Reference Manual". You can also write your own debugger by using the code for pdb as an example. The IDLE interactive development environment, which is part of the standard Python distribution (normally available as Tools/scripts/idle3), includes a graphical debugger. PythonWin is a Python IDE that includes a GUI debugger based on pdb. The PythonWin debugger colors breakpoints and has quite a few cool features such as debugging non-PythonWin programs. PythonWin is available as part of pywin32 project and as a part of the ActivePython distribution. Eric is an IDE built on PyQt and the Scintilla editing component. trepan3k is a gdb-like debugger. Visual Studio Code is an IDE with debugging tools that integrates with version-control software. There are a number of commercial Python IDEs that include graphical debuggers. They include: * Wing IDE * Komodo IDE * PyCharm Are there tools to help find bugs or perform static analysis? ------------------------------------------------------------- Yes. Pylint and Pyflakes do basic checking that will help you catch bugs sooner. Static type checkers such as Mypy, Pyre, and Pytype can check type hints in Python source code. How can I create a stand-alone binary from a Python script? ----------------------------------------------------------- You don’t need the ability to compile Python to C code if all you want is a stand-alone program that users can download and run without having to install the Python distribution first. There are a number of tools that determine the set of modules required by a program and bind these modules together with a Python binary to produce a single executable. One is to use the freeze tool, which is included in the Python source tree as Tools/freeze. It converts Python byte code to C arrays; with a C compiler you can embed all your modules into a new program, which is then linked with the standard Python modules. It works by scanning your source recursively for import statements (in both forms) and looking for the modules in the standard Python path as well as in the source directory (for built-in modules). It then turns the bytecode for modules written in Python into C code (array initializers that can be turned into code objects using the marshal module) and creates a custom-made config file that only contains those built-in modules which are actually used in the program. It then compiles the generated C code and links it with the rest of the Python interpreter to form a self-contained binary which acts exactly like your script. The following packages can help with the creation of console and GUI executables: * Nuitka (Cross-platform) * PyInstaller (Cross-platform) * PyOxidizer (Cross-platform) * cx_Freeze (Cross-platform) * py2app (macOS only) * py2exe (Windows only) Are there coding standards or a style guide for Python programs? ---------------------------------------------------------------- Yes. The coding style required for standard library modules is documented as **PEP 8**. Core Language ============= Why am I getting an UnboundLocalError when the variable has a value? -------------------------------------------------------------------- It can be a surprise to get the "UnboundLocalError" in previously working code when it is modified by adding an assignment statement somewhere in the body of a function. This code: >>> x = 10 >>> def bar(): ... print(x) ... >>> bar() 10 works, but this code: >>> x = 10 >>> def foo(): ... print(x) ... x += 1 results in an "UnboundLocalError": >>> foo() Traceback (most recent call last): ... UnboundLocalError: local variable 'x' referenced before assignment This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope. Since the last statement in foo assigns a new value to "x", the compiler recognizes it as a local variable. Consequently when the earlier "print(x)" attempts to print the uninitialized local variable and an error results. In the example above you can access the outer scope variable by declaring it global: >>> x = 10 >>> def foobar(): ... global x ... print(x) ... x += 1 ... >>> foobar() 10 This explicit declaration is required in order to remind you that (unlike the superficially analogous situation with class and instance variables) you are actually modifying the value of the variable in the outer scope: >>> print(x) 11 You can do a similar thing in a nested scope using the "nonlocal" keyword: >>> def foo(): ... x = 10 ... def bar(): ... nonlocal x ... print(x) ... x += 1 ... bar() ... print(x) ... >>> foo() 10 11 What are the rules for local and global variables in Python? ------------------------------------------------------------ In Python, variables that are only referenced inside a function are implicitly global. If a variable is assigned a value anywhere within the function’s body, it’s assumed to be a local unless explicitly declared as global. Though a bit surprising at first, a moment’s consideration explains this. On one hand, requiring "global" for assigned variables provides a bar against unintended side-effects. On the other hand, if "global" was required for all global references, you’d be using "global" all the time. You’d have to declare as global every reference to a built- in function or to a component of an imported module. This clutter would defeat the usefulness of the "global" declaration for identifying side-effects. Why do lambdas defined in a loop with different values all return the same result? ---------------------------------------------------------------------------------- Assume you use a for loop to define a few different lambdas (or even plain functions), e.g.: >>> squares = [] >>> for x in range(5): ... squares.append(lambda: x**2) This gives you a list that contains 5 lambdas that calculate "x**2". You might expect that, when called, they would return, respectively, "0", "1", "4", "9", and "16". However, when you actually try you will see that they all return "16": >>> squares[2]() 16 >>> squares[4]() 16 This happens because "x" is not local to the lambdas, but is defined in the outer scope, and it is accessed when the lambda is called — not when it is defined. At the end of the loop, the value of "x" is "4", so all the functions now return "4**2", i.e. "16". You can also verify this by changing the value of "x" and see how the results of the lambdas change: >>> x = 8 >>> squares[2]() 64 In order to avoid this, you need to save the values in variables local to the lambdas, so that they don’t rely on the value of the global "x": >>> squares = [] >>> for x in range(5): ... squares.append(lambda n=x: n**2) Here, "n=x" creates a new variable "n" local to the lambda and computed when the lambda is defined so that it has the same value that "x" had at that point in the loop. This means that the value of "n" will be "0" in the first lambda, "1" in the second, "2" in the third, and so on. Therefore each lambda will now return the correct result: >>> squares[2]() 4 >>> squares[4]() 16 Note that this behaviour is not peculiar to lambdas, but applies to regular functions too. How do I share global variables across modules? ----------------------------------------------- The canonical way to share information across modules within a single program is to create a special module (often called config or cfg). Just import the config module in all modules of your application; the module then becomes available as a global name. Because there is only one instance of each module, any changes made to the module object get reflected everywhere. For example: config.py: x = 0 # Default value of the 'x' configuration setting mod.py: import config config.x = 1 main.py: import config import mod print(config.x) Note that using a module is also the basis for implementing the singleton design pattern, for the same reason. What are the “best practices” for using import in a module? ----------------------------------------------------------- In general, don’t use "from modulename import *". Doing so clutters the importer’s namespace, and makes it much harder for linters to detect undefined names. Import modules at the top of a file. Doing so makes it clear what other modules your code requires and avoids questions of whether the module name is in scope. Using one import per line makes it easy to add and delete module imports, but using multiple imports per line uses less screen space. It’s good practice if you import modules in the following order: 1. standard library modules – e.g. "sys", "os", "argparse", "re" 2. third-party library modules (anything installed in Python’s site- packages directory) – e.g. "dateutil", "requests", "PIL.Image" 3. locally developed modules It is sometimes necessary to move imports to a function or class to avoid problems with circular imports. Gordon McMillan says: Circular imports are fine where both modules use the “import ” form of import. They fail when the 2nd module wants to grab a name out of the first (“from module import name”) and the import is at the top level. That’s because names in the 1st are not yet available, because the first module is busy importing the 2nd. In this case, if the second module is only used in one function, then the import can easily be moved into that function. By the time the import is called, the first module will have finished initializing, and the second module can do its import. It may also be necessary to move imports out of the top level of code if some of the modules are platform-specific. In that case, it may not even be possible to import all of the modules at the top of the file. In this case, importing the correct modules in the corresponding platform-specific code is a good option. Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in "sys.modules". Why are default values shared between objects? ---------------------------------------------- This type of bug commonly bites neophyte programmers. Consider this function: def foo(mydict={}): # Danger: shared reference to one dict for all calls ... compute something ... mydict[key] = value return mydict The first time you call this function, "mydict" contains a single item. The second time, "mydict" contains two items because when "foo()" begins executing, "mydict" starts out with an item already in it. It is often expected that a function call creates new objects for default values. This is not what happens. Default values are created exactly once, when the function is defined. If that object is changed, like the dictionary in this example, subsequent calls to the function will refer to this changed object. By definition, immutable objects such as numbers, strings, tuples, and "None", are safe from change. Changes to mutable objects such as dictionaries, lists, and class instances can lead to confusion. Because of this feature, it is good programming practice to not use mutable objects as default values. Instead, use "None" as the default value and inside the function, check if the parameter is "None" and create a new list/dictionary/whatever if it is. For example, don’t write: def foo(mydict={}): ... but: def foo(mydict=None): if mydict is None: mydict = {} # create a new dict for local namespace This feature can be useful. When you have a function that’s time- consuming to compute, a common technique is to cache the parameters and the resulting value of each call to the function, and return the cached value if the same value is requested again. This is called “memoizing”, and can be implemented like this: # Callers can only provide two parameters and optionally pass _cache by keyword def expensive(arg1, arg2, *, _cache={}): if (arg1, arg2) in _cache: return _cache[(arg1, arg2)] # Calculate the value result = ... expensive computation ... _cache[(arg1, arg2)] = result # Store result in the cache return result You could use a global variable containing a dictionary instead of the default value; it’s a matter of taste. How can I pass optional or keyword parameters from one function to another? --------------------------------------------------------------------------- Collect the arguments using the "*" and "**" specifiers in the function’s parameter list; this gives you the positional arguments as a tuple and the keyword arguments as a dictionary. You can then pass these arguments when calling another function by using "*" and "**": def f(x, *args, **kwargs): ... kwargs['width'] = '14.3c' ... g(x, *args, **kwargs) What is the difference between arguments and parameters? -------------------------------------------------------- *Parameters* are defined by the names that appear in a function definition, whereas *arguments* are the values actually passed to a function when calling it. Parameters define what *kind of arguments* a function can accept. For example, given the function definition: def func(foo, bar=None, **kwargs): pass *foo*, *bar* and *kwargs* are parameters of "func". However, when calling "func", for example: func(42, bar=314, extra=somevar) the values "42", "314", and "somevar" are arguments. Why did changing list ‘y’ also change list ‘x’? ----------------------------------------------- If you wrote code like: >>> x = [] >>> y = x >>> y.append(10) >>> y [10] >>> x [10] you might be wondering why appending an element to "y" changed "x" too. There are two factors that produce this result: 1. Variables are simply names that refer to objects. Doing "y = x" doesn’t create a copy of the list – it creates a new variable "y" that refers to the same object "x" refers to. This means that there is only one object (the list), and both "x" and "y" refer to it. 2. Lists are *mutable*, which means that you can change their content. After the call to "append()", the content of the mutable object has changed from "[]" to "[10]". Since both the variables refer to the same object, using either name accesses the modified value "[10]". If we instead assign an immutable object to "x": >>> x = 5 # ints are immutable >>> y = x >>> x = x + 1 # 5 can't be mutated, we are creating a new object here >>> x 6 >>> y 5 we can see that in this case "x" and "y" are not equal anymore. This is because integers are *immutable*, and when we do "x = x + 1" we are not mutating the int "5" by incrementing its value; instead, we are creating a new object (the int "6") and assigning it to "x" (that is, changing which object "x" refers to). After this assignment we have two objects (the ints "6" and "5") and two variables that refer to them ("x" now refers to "6" but "y" still refers to "5"). Some operations (for example "y.append(10)" and "y.sort()") mutate the object, whereas superficially similar operations (for example "y = y + [10]" and "sorted(y)") create a new object. In general in Python (and in all cases in the standard library) a method that mutates an object will return "None" to help avoid getting the two types of operations confused. So if you mistakenly write "y.sort()" thinking it will give you a sorted copy of "y", you’ll instead end up with "None", which will likely cause your program to generate an easily diagnosed error. However, there is one class of operations where the same operation sometimes has different behaviors with different types: the augmented assignment operators. For example, "+=" mutates lists but not tuples or ints ("a_list += [1, 2, 3]" is equivalent to "a_list.extend([1, 2, 3])" and mutates "a_list", whereas "some_tuple += (1, 2, 3)" and "some_int += 1" create new objects). In other words: * If we have a mutable object ("list", "dict", "set", etc.), we can use some specific operations to mutate it and all the variables that refer to it will see the change. * If we have an immutable object ("str", "int", "tuple", etc.), all the variables that refer to it will always see the same value, but operations that transform that value into a new value always return a new object. If you want to know if two variables refer to the same object or not, you can use the "is" operator, or the built-in function "id()". How do I write a function with output parameters (call by reference)? --------------------------------------------------------------------- Remember that arguments are passed by assignment in Python. Since assignment just creates references to objects, there’s no alias between an argument name in the caller and callee, and so no call-by- reference per se. You can achieve the desired effect in a number of ways. 1. By returning a tuple of the results: >>> def func1(a, b): ... a = 'new-value' # a and b are local names ... b = b + 1 # assigned to new objects ... return a, b # return new values ... >>> x, y = 'old-value', 99 >>> func1(x, y) ('new-value', 100) This is almost always the clearest solution. 2. By using global variables. This isn’t thread-safe, and is not recommended. 3. By passing a mutable (changeable in-place) object: >>> def func2(a): ... a[0] = 'new-value' # 'a' references a mutable list ... a[1] = a[1] + 1 # changes a shared object ... >>> args = ['old-value', 99] >>> func2(args) >>> args ['new-value', 100] 4. By passing in a dictionary that gets mutated: >>> def func3(args): ... args['a'] = 'new-value' # args is a mutable dictionary ... args['b'] = args['b'] + 1 # change it in-place ... >>> args = {'a': 'old-value', 'b': 99} >>> func3(args) >>> args {'a': 'new-value', 'b': 100} 5. Or bundle up values in a class instance: >>> class Namespace: ... def __init__(self, /, **args): ... for key, value in args.items(): ... setattr(self, key, value) ... >>> def func4(args): ... args.a = 'new-value' # args is a mutable Namespace ... args.b = args.b + 1 # change object in-place ... >>> args = Namespace(a='old-value', b=99) >>> func4(args) >>> vars(args) {'a': 'new-value', 'b': 100} There’s almost never a good reason to get this complicated. Your best choice is to return a tuple containing the multiple results. How do you make a higher order function in Python? -------------------------------------------------- You have two choices: you can use nested scopes or you can use callable objects. For example, suppose you wanted to define "linear(a,b)" which returns a function "f(x)" that computes the value "a*x+b". Using nested scopes: def linear(a, b): def result(x): return a * x + b return result Or using a callable object: class linear: def __init__(self, a, b): self.a, self.b = a, b def __call__(self, x): return self.a * x + self.b In both cases, taxes = linear(0.3, 2) gives a callable object where "taxes(10e6) == 0.3 * 10e6 + 2". The callable object approach has the disadvantage that it is a bit slower and results in slightly longer code. However, note that a collection of callables can share their signature via inheritance: class exponential(linear): # __init__ inherited def __call__(self, x): return self.a * (x ** self.b) Object can encapsulate state for several methods: class counter: value = 0 def set(self, x): self.value = x def up(self): self.value = self.value + 1 def down(self): self.value = self.value - 1 count = counter() inc, dec, reset = count.up, count.down, count.set Here "inc()", "dec()" and "reset()" act like functions which share the same counting variable. How do I copy an object in Python? ---------------------------------- In general, try "copy.copy()" or "copy.deepcopy()" for the general case. Not all objects can be copied, but most can. Some objects can be copied more easily. Dictionaries have a "copy()" method: newdict = olddict.copy() Sequences can be copied by slicing: new_l = l[:] How can I find the methods or attributes of an object? ------------------------------------------------------ For an instance "x" of a user-defined class, "dir(x)" returns an alphabetized list of the names containing the instance attributes and methods and attributes defined by its class. How can my code discover the name of an object? ----------------------------------------------- Generally speaking, it can’t, because objects don’t really have names. Essentially, assignment always binds a name to a value; the same is true of "def" and "class" statements, but in that case the value is a callable. Consider the following code: >>> class A: ... pass ... >>> B = A >>> a = B() >>> b = a >>> print(b) <__main__.A object at 0x16D07CC> >>> print(a) <__main__.A object at 0x16D07CC> Arguably the class has a name: even though it is bound to two names and invoked through the name "B" the created instance is still reported as an instance of class "A". However, it is impossible to say whether the instance’s name is "a" or "b", since both names are bound to the same value. Generally speaking it should not be necessary for your code to “know the names” of particular values. Unless you are deliberately writing introspective programs, this is usually an indication that a change of approach might be beneficial. In comp.lang.python, Fredrik Lundh once gave an excellent analogy in answer to this question: The same way as you get the name of that cat you found on your porch: the cat (object) itself cannot tell you its name, and it doesn’t really care – so the only way to find out what it’s called is to ask all your neighbours (namespaces) if it’s their cat (object)… ….and don’t be surprised if you’ll find that it’s known by many names, or no name at all! What’s up with the comma operator’s precedence? ----------------------------------------------- Comma is not an operator in Python. Consider this session: >>> "a" in "b", "a" (False, 'a') Since the comma is not an operator, but a separator between expressions the above is evaluated as if you had entered: ("a" in "b"), "a" not: "a" in ("b", "a") The same is true of the various assignment operators ("=", "+=" etc). They are not truly operators but syntactic delimiters in assignment statements. Is there an equivalent of C’s “?:” ternary operator? ---------------------------------------------------- Yes, there is. The syntax is as follows: [on_true] if [expression] else [on_false] x, y = 50, 25 small = x if x < y else y Before this syntax was introduced in Python 2.5, a common idiom was to use logical operators: [expression] and [on_true] or [on_false] However, this idiom is unsafe, as it can give wrong results when *on_true* has a false boolean value. Therefore, it is always better to use the "... if ... else ..." form. Is it possible to write obfuscated one-liners in Python? -------------------------------------------------------- Yes. Usually this is done by nesting "lambda" within "lambda". See the following three examples, slightly adapted from Ulf Bartelt: from functools import reduce # Primes < 1000 print(list(filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0, map(lambda x,y=y:y%x,range(2,int(pow(y,0.5)+1))),1),range(2,1000))))) # First 10 Fibonacci numbers print(list(map(lambda x,f=lambda x,f:(f(x-1,f)+f(x-2,f)) if x>1 else 1: f(x,f), range(10)))) # Mandelbrot set print((lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+'\n'+y,map(lambda y, Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM, Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro, i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y >=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr( 64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy ))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24)) # \___ ___/ \___ ___/ | | |__ lines on screen # V V | |______ columns on screen # | | |__________ maximum of "iterations" # | |_________________ range on y axis # |____________________________ range on x axis Don’t try this at home, kids! What does the slash(/) in the parameter list of a function mean? ---------------------------------------------------------------- A slash in the argument list of a function denotes that the parameters prior to it are positional-only. Positional-only parameters are the ones without an externally usable name. Upon calling a function that accepts positional-only parameters, arguments are mapped to parameters based solely on their position. For example, "divmod()" is a function that accepts positional-only parameters. Its documentation looks like this: >>> help(divmod) Help on built-in function divmod in module builtins: divmod(x, y, /) Return the tuple (x//y, x%y). Invariant: div*y + mod == x. The slash at the end of the parameter list means that both parameters are positional-only. Thus, calling "divmod()" with keyword arguments would lead to an error: >>> divmod(x=3, y=4) Traceback (most recent call last): File "", line 1, in TypeError: divmod() takes no keyword arguments Numbers and strings =================== How do I specify hexadecimal and octal integers? ------------------------------------------------ To specify an octal digit, precede the octal value with a zero, and then a lower or uppercase “o”. For example, to set the variable “a” to the octal value “10” (8 in decimal), type: >>> a = 0o10 >>> a 8 Hexadecimal is just as easy. Simply precede the hexadecimal number with a zero, and then a lower or uppercase “x”. Hexadecimal digits can be specified in lower or uppercase. For example, in the Python interpreter: >>> a = 0xa5 >>> a 165 >>> b = 0XB2 >>> b 178 Why does -22 // 10 return -3? ----------------------------- It’s primarily driven by the desire that "i % j" have the same sign as "j". If you want that, and also want: i == (i // j) * j + (i % j) then integer division has to return the floor. C also requires that identity to hold, and then compilers that truncate "i // j" need to make "i % j" have the same sign as "i". There are few real use cases for "i % j" when "j" is negative. When "j" is positive, there are many, and in virtually all of them it’s more useful for "i % j" to be ">= 0". If the clock says 10 now, what did it say 200 hours ago? "-190 % 12 == 2" is useful; "-190 % 12 == -10" is a bug waiting to bite. How do I get int literal attribute instead of SyntaxError? ---------------------------------------------------------- Trying to lookup an "int" literal attribute in the normal manner gives a "SyntaxError" because the period is seen as a decimal point: >>> 1.__class__ File "", line 1 1.__class__ ^ SyntaxError: invalid decimal literal The solution is to separate the literal from the period with either a space or parentheses. >>> 1 .__class__ >>> (1).__class__ How do I convert a string to a number? -------------------------------------- For integers, use the built-in "int()" type constructor, e.g. "int('144') == 144". Similarly, "float()" converts to a floating- point number, e.g. "float('144') == 144.0". By default, these interpret the number as decimal, so that "int('0144') == 144" holds true, and "int('0x144')" raises "ValueError". "int(string, base)" takes the base to convert from as a second optional argument, so "int( '0x144', 16) == 324". If the base is specified as 0, the number is interpreted using Python’s rules: a leading ‘0o’ indicates octal, and ‘0x’ indicates a hex number. Do not use the built-in function "eval()" if all you need is to convert strings to numbers. "eval()" will be significantly slower and it presents a security risk: someone could pass you a Python expression that might have unwanted side effects. For example, someone could pass "__import__('os').system("rm -rf $HOME")" which would erase your home directory. "eval()" also has the effect of interpreting numbers as Python expressions, so that e.g. "eval('09')" gives a syntax error because Python does not allow leading ‘0’ in a decimal number (except ‘0’). How do I convert a number to a string? -------------------------------------- To convert, e.g., the number "144" to the string "'144'", use the built-in type constructor "str()". If you want a hexadecimal or octal representation, use the built-in functions "hex()" or "oct()". For fancy formatting, see the f-strings and Format String Syntax sections, e.g. ""{:04d}".format(144)" yields "'0144'" and ""{:.3f}".format(1.0/3.0)" yields "'0.333'". How do I modify a string in place? ---------------------------------- You can’t, because strings are immutable. In most situations, you should simply construct a new string from the various parts you want to assemble it from. However, if you need an object with the ability to modify in-place unicode data, try using an "io.StringIO" object or the "array" module: >>> import io >>> s = "Hello, world" >>> sio = io.StringIO(s) >>> sio.getvalue() 'Hello, world' >>> sio.seek(7) 7 >>> sio.write("there!") 6 >>> sio.getvalue() 'Hello, there!' >>> import array >>> a = array.array('w', s) >>> print(a) array('w', 'Hello, world') >>> a[0] = 'y' >>> print(a) array('w', 'yello, world') >>> a.tounicode() 'yello, world' How do I use strings to call functions/methods? ----------------------------------------------- There are various techniques. * The best is to use a dictionary that maps strings to functions. The primary advantage of this technique is that the strings do not need to match the names of the functions. This is also the primary technique used to emulate a case construct: def a(): pass def b(): pass dispatch = {'go': a, 'stop': b} # Note lack of parens for funcs dispatch[get_input()]() # Note trailing parens to call function * Use the built-in function "getattr()": import foo getattr(foo, 'bar')() Note that "getattr()" works on any object, including classes, class instances, modules, and so on. This is used in several places in the standard library, like this: class Foo: def do_foo(self): ... def do_bar(self): ... f = getattr(foo_instance, 'do_' + opname) f() * Use "locals()" to resolve the function name: def myFunc(): print("hello") fname = "myFunc" f = locals()[fname] f() Is there an equivalent to Perl’s "chomp()" for removing trailing newlines from strings? --------------------------------------------------------------------------------------- You can use "S.rstrip("\r\n")" to remove all occurrences of any line terminator from the end of the string "S" without removing other trailing whitespace. If the string "S" represents more than one line, with several empty lines at the end, the line terminators for all the blank lines will be removed: >>> lines = ("line 1 \r\n" ... "\r\n" ... "\r\n") >>> lines.rstrip("\n\r") 'line 1 ' Since this is typically only desired when reading text one line at a time, using "S.rstrip()" this way works well. Is there a "scanf()" or "sscanf()" equivalent? ---------------------------------------------- Not as such. For simple input parsing, the easiest approach is usually to split the line into whitespace-delimited words using the "split()" method of string objects and then convert decimal strings to numeric values using "int()" or "float()". "split()" supports an optional “sep” parameter which is useful if the line uses something other than whitespace as a separator. For more complicated input parsing, regular expressions are more powerful than C’s "sscanf" and better suited for the task. What does "UnicodeDecodeError" or "UnicodeEncodeError" error mean? ------------------------------------------------------------------ See the Unicode HOWTO. Can I end a raw string with an odd number of backslashes? --------------------------------------------------------- A raw string ending with an odd number of backslashes will escape the string’s quote: >>> r'C:\this\will\not\work\' File "", line 1 r'C:\this\will\not\work\' ^ SyntaxError: unterminated string literal (detected at line 1) There are several workarounds for this. One is to use regular strings and double the backslashes: >>> 'C:\\this\\will\\work\\' 'C:\\this\\will\\work\\' Another is to concatenate a regular string containing an escaped backslash to the raw string: >>> r'C:\this\will\work' '\\' 'C:\\this\\will\\work\\' It is also possible to use "os.path.join()" to append a backslash on Windows: >>> os.path.join(r'C:\this\will\work', '') 'C:\\this\\will\\work\\' Note that while a backslash will “escape” a quote for the purposes of determining where the raw string ends, no escaping occurs when interpreting the value of the raw string. That is, the backslash remains present in the value of the raw string: >>> r'backslash\'preserved' "backslash\\'preserved" Also see the specification in the language reference. Performance =========== My program is too slow. How do I speed it up? --------------------------------------------- That’s a tough one, in general. First, here are a list of things to remember before diving further: * Performance characteristics vary across Python implementations. This FAQ focuses on *CPython*. * Behaviour can vary across operating systems, especially when talking about I/O or multi-threading. * You should always find the hot spots in your program *before* attempting to optimize any code (see the "profile" module). * Writing benchmark scripts will allow you to iterate quickly when searching for improvements (see the "timeit" module). * It is highly recommended to have good code coverage (through unit testing or any other technique) before potentially introducing regressions hidden in sophisticated optimizations. That being said, there are many tricks to speed up Python code. Here are some general principles which go a long way towards reaching acceptable performance levels: * Making your algorithms faster (or changing to faster ones) can yield much larger benefits than trying to sprinkle micro-optimization tricks all over your code. * Use the right data structures. Study documentation for the Built-in Types and the "collections" module. * When the standard library provides a primitive for doing something, it is likely (although not guaranteed) to be faster than any alternative you may come up with. This is doubly true for primitives written in C, such as builtins and some extension types. For example, be sure to use either the "list.sort()" built-in method or the related "sorted()" function to do sorting (and see the Sorting Techniques for examples of moderately advanced usage). * Abstractions tend to create indirections and force the interpreter to work more. If the levels of indirection outweigh the amount of useful work done, your program will be slower. You should avoid excessive abstraction, especially under the form of tiny functions or methods (which are also often detrimental to readability). If you have reached the limit of what pure Python can allow, there are tools to take you further away. For example, Cython can compile a slightly modified version of Python code into a C extension, and can be used on many different platforms. Cython can take advantage of compilation (and optional type annotations) to make your code significantly faster than when interpreted. If you are confident in your C programming skills, you can also write a C extension module yourself. See also: The wiki page devoted to performance tips. What is the most efficient way to concatenate many strings together? -------------------------------------------------------------------- "str" and "bytes" objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length. To accumulate many "str" objects, the recommended idiom is to place them into a list and call "str.join()" at the end: chunks = [] for s in my_strings: chunks.append(s) result = ''.join(chunks) (another reasonably efficient idiom is to use "io.StringIO") To accumulate many "bytes" objects, the recommended idiom is to extend a "bytearray" object using in-place concatenation (the "+=" operator): result = bytearray() for b in my_bytes_objects: result += b Sequences (Tuples/Lists) ======================== How do I convert between tuples and lists? ------------------------------------------ The type constructor "tuple(seq)" converts any sequence (actually, any iterable) into a tuple with the same items in the same order. For example, "tuple([1, 2, 3])" yields "(1, 2, 3)" and "tuple('abc')" yields "('a', 'b', 'c')". If the argument is a tuple, it does not make a copy but returns the same object, so it is cheap to call "tuple()" when you aren’t sure that an object is already a tuple. The type constructor "list(seq)" converts any sequence or iterable into a list with the same items in the same order. For example, "list((1, 2, 3))" yields "[1, 2, 3]" and "list('abc')" yields "['a', 'b', 'c']". If the argument is a list, it makes a copy just like "seq[:]" would. What’s a negative index? ------------------------ Python sequences are indexed with positive numbers and negative numbers. For positive numbers 0 is the first index 1 is the second index and so forth. For negative indices -1 is the last index and -2 is the penultimate (next to last) index and so forth. Think of "seq[-n]" as the same as "seq[len(seq)-n]". Using negative indices can be very convenient. For example "S[:-1]" is all of the string except for its last character, which is useful for removing the trailing newline from a string. How do I iterate over a sequence in reverse order? -------------------------------------------------- Use the "reversed()" built-in function: for x in reversed(sequence): ... # do something with x ... This won’t touch your original sequence, but build a new copy with reversed order to iterate over. How do you remove duplicates from a list? ----------------------------------------- See the Python Cookbook for a long discussion of many ways to do this: https://code.activestate.com/recipes/52560/ If you don’t mind reordering the list, sort it and then scan from the end of the list, deleting duplicates as you go: if mylist: mylist.sort() last = mylist[-1] for i in range(len(mylist)-2, -1, -1): if last == mylist[i]: del mylist[i] else: last = mylist[i] If all elements of the list may be used as set keys (i.e. they are all *hashable*) this is often faster mylist = list(set(mylist)) This converts the list into a set, thereby removing duplicates, and then back into a list. How do you remove multiple items from a list -------------------------------------------- As with removing duplicates, explicitly iterating in reverse with a delete condition is one possibility. However, it is easier and faster to use slice replacement with an implicit or explicit forward iteration. Here are three variations.: mylist[:] = filter(keep_function, mylist) mylist[:] = (x for x in mylist if keep_condition) mylist[:] = [x for x in mylist if keep_condition] The list comprehension may be fastest. How do you make an array in Python? ----------------------------------- Use a list: ["this", 1, "is", "an", "array"] Lists are equivalent to C or Pascal arrays in their time complexity; the primary difference is that a Python list can contain objects of many different types. The "array" module also provides methods for creating arrays of fixed types with compact representations, but they are slower to index than lists. Also note that NumPy and other third party packages define array-like structures with various characteristics as well. To get Lisp-style linked lists, you can emulate *cons cells* using tuples: lisp_list = ("like", ("this", ("example", None) ) ) If mutability is desired, you could use lists instead of tuples. Here the analogue of a Lisp *car* is "lisp_list[0]" and the analogue of *cdr* is "lisp_list[1]". Only do this if you’re sure you really need to, because it’s usually a lot slower than using Python lists. How do I create a multidimensional list? ---------------------------------------- You probably tried to make a multidimensional array like this: >>> A = [[None] * 2] * 3 This looks correct if you print it: >>> A [[None, None], [None, None], [None, None]] But when you assign a value, it shows up in multiple places: >>> A[0][0] = 5 >>> A [[5, None], [5, None], [5, None]] The reason is that replicating a list with "*" doesn’t create copies, it only creates references to the existing objects. The "*3" creates a list containing 3 references to the same list of length two. Changes to one row will show in all rows, which is almost certainly not what you want. The suggested approach is to create a list of the desired length first and then fill in each element with a newly created list: A = [None] * 3 for i in range(3): A[i] = [None] * 2 This generates a list containing 3 different lists of length two. You can also use a list comprehension: w, h = 2, 3 A = [[None] * w for i in range(h)] Or, you can use an extension that provides a matrix datatype; NumPy is the best known. How do I apply a method or function to a sequence of objects? ------------------------------------------------------------- To call a method or function and accumulate the return values is a list, a *list comprehension* is an elegant solution: result = [obj.method() for obj in mylist] result = [function(obj) for obj in mylist] To just run the method or function without saving the return values, a plain "for" loop will suffice: for obj in mylist: obj.method() for obj in mylist: function(obj) Why does a_tuple[i] += [‘item’] raise an exception when the addition works? --------------------------------------------------------------------------- This is because of a combination of the fact that augmented assignment operators are *assignment* operators, and the difference between mutable and immutable objects in Python. This discussion applies in general when augmented assignment operators are applied to elements of a tuple that point to mutable objects, but we’ll use a "list" and "+=" as our exemplar. If you wrote: >>> a_tuple = (1, 2) >>> a_tuple[0] += 1 Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment The reason for the exception should be immediately clear: "1" is added to the object "a_tuple[0]" points to ("1"), producing the result object, "2", but when we attempt to assign the result of the computation, "2", to element "0" of the tuple, we get an error because we can’t change what an element of a tuple points to. Under the covers, what this augmented assignment statement is doing is approximately this: >>> result = a_tuple[0] + 1 >>> a_tuple[0] = result Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment It is the assignment part of the operation that produces the error, since a tuple is immutable. When you write something like: >>> a_tuple = (['foo'], 'bar') >>> a_tuple[0] += ['item'] Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment The exception is a bit more surprising, and even more surprising is the fact that even though there was an error, the append worked: >>> a_tuple[0] ['foo', 'item'] To see why this happens, you need to know that (a) if an object implements an "__iadd__()" magic method, it gets called when the "+=" augmented assignment is executed, and its return value is what gets used in the assignment statement; and (b) for lists, "__iadd__()" is equivalent to calling "extend()" on the list and returning the list. That’s why we say that for lists, "+=" is a “shorthand” for "list.extend()": >>> a_list = [] >>> a_list += [1] >>> a_list [1] This is equivalent to: >>> result = a_list.__iadd__([1]) >>> a_list = result The object pointed to by a_list has been mutated, and the pointer to the mutated object is assigned back to "a_list". The end result of the assignment is a no-op, since it is a pointer to the same object that "a_list" was previously pointing to, but the assignment still happens. Thus, in our tuple example what is happening is equivalent to: >>> result = a_tuple[0].__iadd__(['item']) >>> a_tuple[0] = result Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment The "__iadd__()" succeeds, and thus the list is extended, but even though "result" points to the same object that "a_tuple[0]" already points to, that final assignment still results in an error, because tuples are immutable. I want to do a complicated sort: can you do a Schwartzian Transform in Python? ------------------------------------------------------------------------------ The technique, attributed to Randal Schwartz of the Perl community, sorts the elements of a list by a metric which maps each element to its “sort value”. In Python, use the "key" argument for the "list.sort()" method: Isorted = L[:] Isorted.sort(key=lambda s: int(s[10:15])) How can I sort one list by values from another list? ---------------------------------------------------- Merge them into an iterator of tuples, sort the resulting list, and then pick out the element you want. >>> list1 = ["what", "I'm", "sorting", "by"] >>> list2 = ["something", "else", "to", "sort"] >>> pairs = zip(list1, list2) >>> pairs = sorted(pairs) >>> pairs [("I'm", 'else'), ('by', 'sort'), ('sorting', 'to'), ('what', 'something')] >>> result = [x[1] for x in pairs] >>> result ['else', 'sort', 'to', 'something'] Objects ======= What is a class? ---------------- A class is the particular object type created by executing a class statement. Class objects are used as templates to create instance objects, which embody both the data (attributes) and code (methods) specific to a datatype. A class can be based on one or more other classes, called its base class(es). It then inherits the attributes and methods of its base classes. This allows an object model to be successively refined by inheritance. You might have a generic "Mailbox" class that provides basic accessor methods for a mailbox, and subclasses such as "MboxMailbox", "MaildirMailbox", "OutlookMailbox" that handle various specific mailbox formats. What is a method? ----------------- A method is a function on some object "x" that you normally call as "x.name(arguments...)". Methods are defined as functions inside the class definition: class C: def meth(self, arg): return arg * 2 + self.attribute What is self? ------------- Self is merely a conventional name for the first argument of a method. A method defined as "meth(self, a, b, c)" should be called as "x.meth(a, b, c)" for some instance "x" of the class in which the definition occurs; the called method will think it is called as "meth(x, a, b, c)". See also Why must ‘self’ be used explicitly in method definitions and calls?. How do I check if an object is an instance of a given class or of a subclass of it? ----------------------------------------------------------------------------------- Use the built-in function "isinstance(obj, cls)". You can check if an object is an instance of any of a number of classes by providing a tuple instead of a single class, e.g. "isinstance(obj, (class1, class2, ...))", and can also check whether an object is one of Python’s built-in types, e.g. "isinstance(obj, str)" or "isinstance(obj, (int, float, complex))". Note that "isinstance()" also checks for virtual inheritance from an *abstract base class*. So, the test will return "True" for a registered class even if hasn’t directly or indirectly inherited from it. To test for “true inheritance”, scan the *MRO* of the class: from collections.abc import Mapping class P: pass class C(P): pass Mapping.register(P) >>> c = C() >>> isinstance(c, C) # direct True >>> isinstance(c, P) # indirect True >>> isinstance(c, Mapping) # virtual True # Actual inheritance chain >>> type(c).__mro__ (, , ) # Test for "true inheritance" >>> Mapping in type(c).__mro__ False Note that most programs do not use "isinstance()" on user-defined classes very often. If you are developing the classes yourself, a more proper object-oriented style is to define methods on the classes that encapsulate a particular behaviour, instead of checking the object’s class and doing a different thing based on what class it is. For example, if you have a function that does something: def search(obj): if isinstance(obj, Mailbox): ... # code to search a mailbox elif isinstance(obj, Document): ... # code to search a document elif ... A better approach is to define a "search()" method on all the classes and just call it: class Mailbox: def search(self): ... # code to search a mailbox class Document: def search(self): ... # code to search a document obj.search() What is delegation? ------------------- Delegation is an object oriented technique (also called a design pattern). Let’s say you have an object "x" and want to change the behaviour of just one of its methods. You can create a new class that provides a new implementation of the method you’re interested in changing and delegates all other methods to the corresponding method of "x". Python programmers can easily implement delegation. For example, the following class implements a class that behaves like a file but converts all written data to uppercase: class UpperOut: def __init__(self, outfile): self._outfile = outfile def write(self, s): self._outfile.write(s.upper()) def __getattr__(self, name): return getattr(self._outfile, name) Here the "UpperOut" class redefines the "write()" method to convert the argument string to uppercase before calling the underlying "self._outfile.write()" method. All other methods are delegated to the underlying "self._outfile" object. The delegation is accomplished via the "__getattr__()" method; consult the language reference for more information about controlling attribute access. Note that for more general cases delegation can get trickier. When attributes must be set as well as retrieved, the class must define a "__setattr__()" method too, and it must do so carefully. The basic implementation of "__setattr__()" is roughly equivalent to the following: class X: ... def __setattr__(self, name, value): self.__dict__[name] = value ... Many "__setattr__()" implementations call "object.__setattr__()" to set an attribute on self without causing infinite recursion: class X: def __setattr__(self, name, value): # Custom logic here... object.__setattr__(self, name, value) Alternatively, it is possible to set attributes by inserting entries into "self.__dict__" directly. How do I call a method defined in a base class from a derived class that extends it? ------------------------------------------------------------------------------------ Use the built-in "super()" function: class Derived(Base): def meth(self): super().meth() # calls Base.meth In the example, "super()" will automatically determine the instance from which it was called (the "self" value), look up the *method resolution order* (MRO) with "type(self).__mro__", and return the next in line after "Derived" in the MRO: "Base". How can I organize my code to make it easier to change the base class? ---------------------------------------------------------------------- You could assign the base class to an alias and derive from the alias. Then all you have to change is the value assigned to the alias. Incidentally, this trick is also handy if you want to decide dynamically (e.g. depending on availability of resources) which base class to use. Example: class Base: ... BaseAlias = Base class Derived(BaseAlias): ... How do I create static class data and static class methods? ----------------------------------------------------------- Both static data and static methods (in the sense of C++ or Java) are supported in Python. For static data, simply define a class attribute. To assign a new value to the attribute, you have to explicitly use the class name in the assignment: class C: count = 0 # number of times C.__init__ called def __init__(self): C.count = C.count + 1 def getcount(self): return C.count # or return self.count "c.count" also refers to "C.count" for any "c" such that "isinstance(c, C)" holds, unless overridden by "c" itself or by some class on the base-class search path from "c.__class__" back to "C". Caution: within a method of C, an assignment like "self.count = 42" creates a new and unrelated instance named “count” in "self"’s own dict. Rebinding of a class-static data name must always specify the class whether inside a method or not: C.count = 314 Static methods are possible: class C: @staticmethod def static(arg1, arg2, arg3): # No 'self' parameter! ... However, a far more straightforward way to get the effect of a static method is via a simple module-level function: def getcount(): return C.count If your code is structured so as to define one class (or tightly related class hierarchy) per module, this supplies the desired encapsulation. How can I overload constructors (or methods) in Python? ------------------------------------------------------- This answer actually applies to all methods, but the question usually comes up first in the context of constructors. In C++ you’d write class C { C() { cout << "No arguments\n"; } C(int i) { cout << "Argument is " << i << "\n"; } } In Python you have to write a single constructor that catches all cases using default arguments. For example: class C: def __init__(self, i=None): if i is None: print("No arguments") else: print("Argument is", i) This is not entirely equivalent, but close enough in practice. You could also try a variable-length argument list, e.g. def __init__(self, *args): ... The same approach works for all method definitions. I try to use __spam and I get an error about _SomeClassName__spam. ------------------------------------------------------------------ Variable names with double leading underscores are “mangled” to provide a simple but effective way to define class private variables. Any identifier of the form "__spam" (at least two leading underscores, at most one trailing underscore) is textually replaced with "_classname__spam", where "classname" is the current class name with any leading underscores stripped. The identifier can be used unchanged within the class, but to access it outside the class, the mangled name must be used: class A: def __one(self): return 1 def two(self): return 2 * self.__one() class B(A): def three(self): return 3 * self._A__one() four = 4 * A()._A__one() In particular, this does not guarantee privacy since an outside user can still deliberately access the private attribute; many Python programmers never bother to use private variable names at all. See also: The private name mangling specifications for details and special cases. My class defines __del__ but it is not called when I delete the object. ----------------------------------------------------------------------- There are several possible reasons for this. The "del" statement does not necessarily call "__del__()" – it simply decrements the object’s reference count, and if this reaches zero "__del__()" is called. If your data structures contain circular links (e.g. a tree where each child has a parent reference and each parent has a list of children) the reference counts will never go back to zero. Once in a while Python runs an algorithm to detect such cycles, but the garbage collector might run some time after the last reference to your data structure vanishes, so your "__del__()" method may be called at an inconvenient and random time. This is inconvenient if you’re trying to reproduce a problem. Worse, the order in which object’s "__del__()" methods are executed is arbitrary. You can run "gc.collect()" to force a collection, but there *are* pathological cases where objects will never be collected. Despite the cycle collector, it’s still a good idea to define an explicit "close()" method on objects to be called whenever you’re done with them. The "close()" method can then remove attributes that refer to subobjects. Don’t call "__del__()" directly – "__del__()" should call "close()" and "close()" should make sure that it can be called more than once for the same object. Another way to avoid cyclical references is to use the "weakref" module, which allows you to point to objects without incrementing their reference count. Tree data structures, for instance, should use weak references for their parent and sibling references (if they need them!). Finally, if your "__del__()" method raises an exception, a warning message is printed to "sys.stderr". How do I get a list of all instances of a given class? ------------------------------------------------------ Python does not keep track of all instances of a class (or of a built- in type). You can program the class’s constructor to keep track of all instances by keeping a list of weak references to each instance. Why does the result of "id()" appear to be not unique? ------------------------------------------------------ The "id()" builtin returns an integer that is guaranteed to be unique during the lifetime of the object. Since in CPython, this is the object’s memory address, it happens frequently that after an object is deleted from memory, the next freshly created object is allocated at the same position in memory. This is illustrated by this example: >>> id(1000) 13901272 >>> id(2000) 13901272 The two ids belong to different integer objects that are created before, and deleted immediately after execution of the "id()" call. To be sure that objects whose id you want to examine are still alive, create another reference to the object: >>> a = 1000; b = 2000 >>> id(a) 13901272 >>> id(b) 13891296 When can I rely on identity tests with the *is* operator? --------------------------------------------------------- The "is" operator tests for object identity. The test "a is b" is equivalent to "id(a) == id(b)". The most important property of an identity test is that an object is always identical to itself, "a is a" always returns "True". Identity tests are usually faster than equality tests. And unlike equality tests, identity tests are guaranteed to return a boolean "True" or "False". However, identity tests can *only* be substituted for equality tests when object identity is assured. Generally, there are three circumstances where identity is guaranteed: 1. Assignments create new names but do not change object identity. After the assignment "new = old", it is guaranteed that "new is old". 2. Putting an object in a container that stores object references does not change object identity. After the list assignment "s[0] = x", it is guaranteed that "s[0] is x". 3. If an object is a singleton, it means that only one instance of that object can exist. After the assignments "a = None" and "b = None", it is guaranteed that "a is b" because "None" is a singleton. In most other circumstances, identity tests are inadvisable and equality tests are preferred. In particular, identity tests should not be used to check constants such as "int" and "str" which aren’t guaranteed to be singletons: >>> a = 1000 >>> b = 500 >>> c = b + 500 >>> a is c False >>> a = 'Python' >>> b = 'Py' >>> c = b + 'thon' >>> a is c False Likewise, new instances of mutable containers are never identical: >>> a = [] >>> b = [] >>> a is b False In the standard library code, you will see several common patterns for correctly using identity tests: 1. As recommended by **PEP 8**, an identity test is the preferred way to check for "None". This reads like plain English in code and avoids confusion with other objects that may have boolean values that evaluate to false. 2. Detecting optional arguments can be tricky when "None" is a valid input value. In those situations, you can create a singleton sentinel object guaranteed to be distinct from other objects. For example, here is how to implement a method that behaves like "dict.pop()": _sentinel = object() def pop(self, key, default=_sentinel): if key in self: value = self[key] del self[key] return value if default is _sentinel: raise KeyError(key) return default 3. Container implementations sometimes need to augment equality tests with identity tests. This prevents the code from being confused by objects such as "float('NaN')" that are not equal to themselves. For example, here is the implementation of "collections.abc.Sequence.__contains__()": def __contains__(self, value): for v in self: if v is value or v == value: return True return False How can a subclass control what data is stored in an immutable instance? ------------------------------------------------------------------------ When subclassing an immutable type, override the "__new__()" method instead of the "__init__()" method. The latter only runs *after* an instance is created, which is too late to alter data in an immutable instance. All of these immutable classes have a different signature than their parent class: from datetime import date class FirstOfMonthDate(date): "Always choose the first day of the month" def __new__(cls, year, month, day): return super().__new__(cls, year, month, 1) class NamedInt(int): "Allow text names for some numbers" xlat = {'zero': 0, 'one': 1, 'ten': 10} def __new__(cls, value): value = cls.xlat.get(value, value) return super().__new__(cls, value) class TitleStr(str): "Convert str to name suitable for a URL path" def __new__(cls, s): s = s.lower().replace(' ', '-') s = ''.join([c for c in s if c.isalnum() or c == '-']) return super().__new__(cls, s) The classes can be used like this: >>> FirstOfMonthDate(2012, 2, 14) FirstOfMonthDate(2012, 2, 1) >>> NamedInt('ten') 10 >>> NamedInt(20) 20 >>> TitleStr('Blog: Why Python Rocks') 'blog-why-python-rocks' How do I cache method calls? ---------------------------- The two principal tools for caching methods are "functools.cached_property()" and "functools.lru_cache()". The former stores results at the instance level and the latter at the class level. The *cached_property* approach only works with methods that do not take any arguments. It does not create a reference to the instance. The cached method result will be kept only as long as the instance is alive. The advantage is that when an instance is no longer used, the cached method result will be released right away. The disadvantage is that if instances accumulate, so too will the accumulated method results. They can grow without bound. The *lru_cache* approach works with methods that have *hashable* arguments. It creates a reference to the instance unless special efforts are made to pass in weak references. The advantage of the least recently used algorithm is that the cache is bounded by the specified *maxsize*. The disadvantage is that instances are kept alive until they age out of the cache or until the cache is cleared. This example shows the various techniques: class Weather: "Lookup weather information on a government website" def __init__(self, station_id): self._station_id = station_id # The _station_id is private and immutable def current_temperature(self): "Latest hourly observation" # Do not cache this because old results # can be out of date. @cached_property def location(self): "Return the longitude/latitude coordinates of the station" # Result only depends on the station_id @lru_cache(maxsize=20) def historic_rainfall(self, date, units='mm'): "Rainfall on a given date" # Depends on the station_id, date, and units. The above example assumes that the *station_id* never changes. If the relevant instance attributes are mutable, the *cached_property* approach can’t be made to work because it cannot detect changes to the attributes. To make the *lru_cache* approach work when the *station_id* is mutable, the class needs to define the "__eq__()" and "__hash__()" methods so that the cache can detect relevant attribute updates: class Weather: "Example with a mutable station identifier" def __init__(self, station_id): self.station_id = station_id def change_station(self, station_id): self.station_id = station_id def __eq__(self, other): return self.station_id == other.station_id def __hash__(self): return hash(self.station_id) @lru_cache(maxsize=20) def historic_rainfall(self, date, units='cm'): 'Rainfall on a given date' # Depends on the station_id, date, and units. Modules ======= How do I create a .pyc file? ---------------------------- When a module is imported for the first time (or when the source file has changed since the current compiled file was created) a ".pyc" file containing the compiled code should be created in a "__pycache__" subdirectory of the directory containing the ".py" file. The ".pyc" file will have a filename that starts with the same name as the ".py" file, and ends with ".pyc", with a middle component that depends on the particular "python" binary that created it. (See **PEP 3147** for details.) One reason that a ".pyc" file may not be created is a permissions problem with the directory containing the source file, meaning that the "__pycache__" subdirectory cannot be created. This can happen, for example, if you develop as one user but run as another, such as if you are testing with a web server. Unless the "PYTHONDONTWRITEBYTECODE" environment variable is set, creation of a .pyc file is automatic if you’re importing a module and Python has the ability (permissions, free space, etc…) to create a "__pycache__" subdirectory and write the compiled module to that subdirectory. Running Python on a top level script is not considered an import and no ".pyc" will be created. For example, if you have a top-level module "foo.py" that imports another module "xyz.py", when you run "foo" (by typing "python foo.py" as a shell command), a ".pyc" will be created for "xyz" because "xyz" is imported, but no ".pyc" file will be created for "foo" since "foo.py" isn’t being imported. If you need to create a ".pyc" file for "foo" – that is, to create a ".pyc" file for a module that is not imported – you can, using the "py_compile" and "compileall" modules. The "py_compile" module can manually compile any module. One way is to use the "compile()" function in that module interactively: >>> import py_compile >>> py_compile.compile('foo.py') This will write the ".pyc" to a "__pycache__" subdirectory in the same location as "foo.py" (or you can override that with the optional parameter "cfile"). You can also automatically compile all files in a directory or directories using the "compileall" module. You can do it from the shell prompt by running "compileall.py" and providing the path of a directory containing Python files to compile: python -m compileall . How do I find the current module name? -------------------------------------- A module can find out its own module name by looking at the predefined global variable "__name__". If this has the value "'__main__'", the program is running as a script. Many modules that are usually used by importing them also provide a command-line interface or a self-test, and only execute this code after checking "__name__": def main(): print('Running test...') ... if __name__ == '__main__': main() How can I have modules that mutually import each other? ------------------------------------------------------- Suppose you have the following modules: "foo.py": from bar import bar_var foo_var = 1 "bar.py": from foo import foo_var bar_var = 2 The problem is that the interpreter will perform the following steps: * main imports "foo" * Empty globals for "foo" are created * "foo" is compiled and starts executing * "foo" imports "bar" * Empty globals for "bar" are created * "bar" is compiled and starts executing * "bar" imports "foo" (which is a no-op since there already is a module named "foo") * The import mechanism tries to read "foo_var" from "foo" globals, to set "bar.foo_var = foo.foo_var" The last step fails, because Python isn’t done with interpreting "foo" yet and the global symbol dictionary for "foo" is still empty. The same thing happens when you use "import foo", and then try to access "foo.foo_var" in global code. There are (at least) three possible workarounds for this problem. Guido van Rossum recommends avoiding all uses of "from import ...", and placing all code inside functions. Initializations of global variables and class variables should use constants or built-in functions only. This means everything from an imported module is referenced as ".". Jim Roskind suggests performing steps in the following order in each module: * exports (globals, functions, and classes that don’t need imported base classes) * "import" statements * active code (including globals that are initialized from imported values). Van Rossum doesn’t like this approach much because the imports appear in a strange place, but it does work. Matthias Urlichs recommends restructuring your code so that the recursive import is not necessary in the first place. These solutions are not mutually exclusive. __import__(‘x.y.z’) returns ; how do I get z? --------------------------------------------------------- Consider using the convenience function "import_module()" from "importlib" instead: z = importlib.import_module('x.y.z') When I edit an imported module and reimport it, the changes don’t show up. Why does this happen? ------------------------------------------------------------------------------------------------- For reasons of efficiency as well as consistency, Python only reads the module file on the first time a module is imported. If it didn’t, in a program consisting of many modules where each one imports the same basic module, the basic module would be parsed and re-parsed many times. To force re-reading of a changed module, do this: import importlib import modname importlib.reload(modname) Warning: this technique is not 100% fool-proof. In particular, modules containing statements like from modname import some_objects will continue to work with the old version of the imported objects. If the module contains class definitions, existing class instances will *not* be updated to use the new class definition. This can result in the following paradoxical behaviour: >>> import importlib >>> import cls >>> c = cls.C() # Create an instance of C >>> importlib.reload(cls) >>> isinstance(c, cls.C) # isinstance is false?!? False The nature of the problem is made clear if you print out the “identity” of the class objects: >>> hex(id(c.__class__)) '0x7352a0' >>> hex(id(cls.C)) '0x4198d0' Python on Windows FAQ ********************* How do I run a Python program under Windows? ============================================ This is not necessarily a straightforward question. If you are already familiar with running programs from the Windows command line then everything will seem obvious; otherwise, you might need a little more guidance. Unless you use some sort of integrated development environment, you will end up *typing* Windows commands into what is referred to as a “Command prompt window”. Usually you can create such a window from your search bar by searching for "cmd". You should be able to recognize when you have started such a window because you will see a Windows “command prompt”, which usually looks like this: C:\> The letter may be different, and there might be other things after it, so you might just as easily see something like: D:\YourName\Projects\Python> depending on how your computer has been set up and what else you have recently done with it. Once you have started such a window, you are well on the way to running Python programs. You need to realize that your Python scripts have to be processed by another program called the Python *interpreter*. The interpreter reads your script, compiles it into bytecodes, and then executes the bytecodes to run your program. So, how do you arrange for the interpreter to handle your Python? First, you need to make sure that your command window recognises the word “py” as an instruction to start the interpreter. If you have opened a command window, you should try entering the command "py" and hitting return: C:\Users\YourName> py You should then see something like: Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> You have started the interpreter in “interactive mode”. That means you can enter Python statements or expressions interactively and have them executed or evaluated while you wait. This is one of Python’s strongest features. Check it by entering a few expressions of your choice and seeing the results: >>> print("Hello") Hello >>> "Hello" * 3 'HelloHelloHello' Many people use the interactive mode as a convenient yet highly programmable calculator. When you want to end your interactive Python session, call the "exit()" function or hold the "Ctrl" key down while you enter a "Z", then hit the “"Enter"” key to get back to your Windows command prompt. You may also find that you have a Start-menu entry such as Start ‣ Programs ‣ Python 3.x ‣ Python (command line) that results in you seeing the ">>>" prompt in a new window. If so, the window will disappear after you call the "exit()" function or enter the "Ctrl"-"Z" character; Windows is running a single “python” command in the window, and closes it when you terminate the interpreter. Now that we know the "py" command is recognized, you can give your Python script to it. You’ll have to give either an absolute or a relative path to the Python script. Let’s say your Python script is located in your desktop and is named "hello.py", and your command prompt is nicely opened in your home directory so you’re seeing something similar to: C:\Users\YourName> So now you’ll ask the "py" command to give your script to Python by typing "py" followed by your script path: C:\Users\YourName> py Desktop\hello.py hello How do I make Python scripts executable? ======================================== On Windows, the standard Python installer already associates the .py extension with a file type (Python.File) and gives that file type an open command that runs the interpreter ("D:\Program Files\Python\python.exe "%1" %*"). This is enough to make scripts executable from the command prompt as ‘foo.py’. If you’d rather be able to execute the script by simple typing ‘foo’ with no extension you need to add .py to the PATHEXT environment variable. Why does Python sometimes take so long to start? ================================================ Usually Python starts very quickly on Windows, but occasionally there are bug reports that Python suddenly begins to take a long time to start up. This is made even more puzzling because Python will work fine on other Windows systems which appear to be configured identically. The problem may be caused by a misconfiguration of virus checking software on the problem machine. Some virus scanners have been known to introduce startup overhead of two orders of magnitude when the scanner is configured to monitor all reads from the filesystem. Try checking the configuration of virus scanning software on your systems to ensure that they are indeed configured identically. McAfee, when configured to scan all file system read activity, is a particular offender. How do I make an executable from a Python script? ================================================= See How can I create a stand-alone binary from a Python script? for a list of tools that can be used to make executables. Is a "*.pyd" file the same as a DLL? ==================================== Yes, .pyd files are dll’s, but there are a few differences. If you have a DLL named "foo.pyd", then it must have a function "PyInit_foo()". You can then write Python “import foo”, and Python will search for foo.pyd (as well as foo.py, foo.pyc) and if it finds it, will attempt to call "PyInit_foo()" to initialize it. You do not link your .exe with foo.lib, as that would cause Windows to require the DLL to be present. Note that the search path for foo.pyd is PYTHONPATH, not the same as the path that Windows uses to search for foo.dll. Also, foo.pyd need not be present to run your program, whereas if you linked your program with a dll, the dll is required. Of course, foo.pyd is required if you want to say "import foo". In a DLL, linkage is declared in the source code with "__declspec(dllexport)". In a .pyd, linkage is defined in a list of available functions. How can I embed Python into a Windows application? ================================================== Embedding the Python interpreter in a Windows app can be summarized as follows: 1. Do **not** build Python into your .exe file directly. On Windows, Python must be a DLL to handle importing modules that are themselves DLL’s. (This is the first key undocumented fact.) Instead, link to "python*NN*.dll"; it is typically installed in "C:\Windows\System". *NN* is the Python version, a number such as “33” for Python 3.3. You can link to Python in two different ways. Load-time linking means linking against "python*NN*.lib", while run-time linking means linking against "python*NN*.dll". (General note: "python*NN*.lib" is the so-called “import lib” corresponding to "python*NN*.dll". It merely defines symbols for the linker.) Run-time linking greatly simplifies link options; everything happens at run time. Your code must load "python*NN*.dll" using the Windows "LoadLibraryEx()" routine. The code must also use access routines and data in "python*NN*.dll" (that is, Python’s C API’s) using pointers obtained by the Windows "GetProcAddress()" routine. Macros can make using these pointers transparent to any C code that calls routines in Python’s C API. 2. If you use SWIG, it is easy to create a Python “extension module” that will make the app’s data and methods available to Python. SWIG will handle just about all the grungy details for you. The result is C code that you link *into* your .exe file (!) You do **not** have to create a DLL file, and this also simplifies linking. 3. SWIG will create an init function (a C function) whose name depends on the name of the extension module. For example, if the name of the module is leo, the init function will be called initleo(). If you use SWIG shadow classes, as you should, the init function will be called initleoc(). This initializes a mostly hidden helper class used by the shadow class. The reason you can link the C code in step 2 into your .exe file is that calling the initialization function is equivalent to importing the module into Python! (This is the second key undocumented fact.) 4. In short, you can use the following code to initialize the Python interpreter with your extension module. #include ... Py_Initialize(); // Initialize Python. initmyAppc(); // Initialize (import) the helper class. PyRun_SimpleString("import myApp"); // Import the shadow class. 5. There are two problems with Python’s C API which will become apparent if you use a compiler other than MSVC, the compiler used to build pythonNN.dll. Problem 1: The so-called “Very High Level” functions that take "FILE *" arguments will not work in a multi-compiler environment because each compiler’s notion of a "struct FILE" will be different. From an implementation standpoint these are very low level functions. Problem 2: SWIG generates the following code when generating wrappers to void functions: Py_INCREF(Py_None); _resultobj = Py_None; return _resultobj; Alas, Py_None is a macro that expands to a reference to a complex data structure called _Py_NoneStruct inside pythonNN.dll. Again, this code will fail in a mult-compiler environment. Replace such code by: return Py_BuildValue(""); It may be possible to use SWIG’s "%typemap" command to make the change automatically, though I have not been able to get this to work (I’m a complete SWIG newbie). 6. Using a Python shell script to put up a Python interpreter window from inside your Windows app is not a good idea; the resulting window will be independent of your app’s windowing system. Rather, you (or the wxPythonWindow class) should create a “native” interpreter window. It is easy to connect that window to the Python interpreter. You can redirect Python’s i/o to _any_ object that supports read and write, so all you need is a Python object (defined in your extension module) that contains read() and write() methods. How do I keep editors from inserting tabs into my Python source? ================================================================ The FAQ does not recommend using tabs, and the Python style guide, **PEP 8**, recommends 4 spaces for distributed Python code; this is also the Emacs python-mode default. Under any editor, mixing tabs and spaces is a bad idea. MSVC is no different in this respect, and is easily configured to use spaces: Take Tools ‣ Options ‣ Tabs, and for file type “Default” set “Tab size” and “Indent size” to 4, and select the “Insert spaces” radio button. Python raises "IndentationError" or "TabError" if mixed tabs and spaces are causing problems in leading whitespace. You may also run the "tabnanny" module to check a directory tree in batch mode. How do I check for a keypress without blocking? =============================================== Use the "msvcrt" module. This is a standard Windows-specific extension module. It defines a function "kbhit()" which checks whether a keyboard hit is present, and "getch()" which gets one character without echoing it. How do I solve the missing api-ms-win-crt-runtime-l1-1-0.dll error? =================================================================== This can occur on Python 3.5 and later when using Windows 8.1 or earlier without all updates having been installed. First ensure your operating system is supported and is up to date, and if that does not resolve the issue, visit the Microsoft support page for guidance on manually installing the C Runtime update. Glossary ******** ">>>" The default Python prompt of the *interactive* shell. Often seen for code examples which can be executed interactively in the interpreter. "..." Can refer to: * The default Python prompt of the *interactive* shell when entering the code for an indented code block, when within a pair of matching left and right delimiters (parentheses, square brackets, curly braces or triple quotes), or after specifying a decorator. * The "Ellipsis" built-in constant. abstract base class Abstract base classes complement *duck-typing* by providing a way to define interfaces when other techniques like "hasattr()" would be clumsy or subtly wrong (for example with magic methods). ABCs introduce virtual subclasses, which are classes that don’t inherit from a class but are still recognized by "isinstance()" and "issubclass()"; see the "abc" module documentation. Python comes with many built-in ABCs for data structures (in the "collections.abc" module), numbers (in the "numbers" module), streams (in the "io" module), import finders and loaders (in the "importlib.abc" module). You can create your own ABCs with the "abc" module. annotation A label associated with a variable, a class attribute or a function parameter or return value, used by convention as a *type hint*. Annotations of local variables cannot be accessed at runtime, but annotations of global variables, class attributes, and functions are stored in the "__annotations__" special attribute of modules, classes, and functions, respectively. See *variable annotation*, *function annotation*, **PEP 484** and **PEP 526**, which describe this functionality. Also see Annotations Best Practices for best practices on working with annotations. argument A value passed to a *function* (or *method*) when calling the function. There are two kinds of argument: * *keyword argument*: an argument preceded by an identifier (e.g. "name=") in a function call or passed as a value in a dictionary preceded by "**". For example, "3" and "5" are both keyword arguments in the following calls to "complex()": complex(real=3, imag=5) complex(**{'real': 3, 'imag': 5}) * *positional argument*: an argument that is not a keyword argument. Positional arguments can appear at the beginning of an argument list and/or be passed as elements of an *iterable* preceded by "*". For example, "3" and "5" are both positional arguments in the following calls: complex(3, 5) complex(*(3, 5)) Arguments are assigned to the named local variables in a function body. See the Calls section for the rules governing this assignment. Syntactically, any expression can be used to represent an argument; the evaluated value is assigned to the local variable. See also the *parameter* glossary entry, the FAQ question on the difference between arguments and parameters, and **PEP 362**. asynchronous context manager An object which controls the environment seen in an "async with" statement by defining "__aenter__()" and "__aexit__()" methods. Introduced by **PEP 492**. asynchronous generator A function which returns an *asynchronous generator iterator*. It looks like a coroutine function defined with "async def" except that it contains "yield" expressions for producing a series of values usable in an "async for" loop. Usually refers to an asynchronous generator function, but may refer to an *asynchronous generator iterator* in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity. An asynchronous generator function may contain "await" expressions as well as "async for", and "async with" statements. asynchronous generator iterator An object created by a *asynchronous generator* function. This is an *asynchronous iterator* which when called using the "__anext__()" method returns an awaitable object which will execute the body of the asynchronous generator function until the next "yield" expression. Each "yield" temporarily suspends processing, remembering the execution state (including local variables and pending try- statements). When the *asynchronous generator iterator* effectively resumes with another awaitable returned by "__anext__()", it picks up where it left off. See **PEP 492** and **PEP 525**. asynchronous iterable An object, that can be used in an "async for" statement. Must return an *asynchronous iterator* from its "__aiter__()" method. Introduced by **PEP 492**. asynchronous iterator An object that implements the "__aiter__()" and "__anext__()" methods. "__anext__()" must return an *awaitable* object. "async for" resolves the awaitables returned by an asynchronous iterator’s "__anext__()" method until it raises a "StopAsyncIteration" exception. Introduced by **PEP 492**. attribute A value associated with an object which is usually referenced by name using dotted expressions. For example, if an object *o* has an attribute *a* it would be referenced as *o.a*. It is possible to give an object an attribute whose name is not an identifier as defined by Identifiers and keywords, for example using "setattr()", if the object allows it. Such an attribute will not be accessible using a dotted expression, and would instead need to be retrieved with "getattr()". awaitable An object that can be used in an "await" expression. Can be a *coroutine* or an object with an "__await__()" method. See also **PEP 492**. BDFL Benevolent Dictator For Life, a.k.a. Guido van Rossum, Python’s creator. binary file A *file object* able to read and write *bytes-like objects*. Examples of binary files are files opened in binary mode ("'rb'", "'wb'" or "'rb+'"), "sys.stdin.buffer", "sys.stdout.buffer", and instances of "io.BytesIO" and "gzip.GzipFile". See also *text file* for a file object able to read and write "str" objects. borrowed reference In Python’s C API, a borrowed reference is a reference to an object, where the code using the object does not own the reference. It becomes a dangling pointer if the object is destroyed. For example, a garbage collection can remove the last *strong reference* to the object and so destroy it. Calling "Py_INCREF()" on the *borrowed reference* is recommended to convert it to a *strong reference* in-place, except when the object cannot be destroyed before the last usage of the borrowed reference. The "Py_NewRef()" function can be used to create a new *strong reference*. bytes-like object An object that supports the Buffer Protocol and can export a C-*contiguous* buffer. This includes all "bytes", "bytearray", and "array.array" objects, as well as many common "memoryview" objects. Bytes-like objects can be used for various operations that work with binary data; these include compression, saving to a binary file, and sending over a socket. Some operations need the binary data to be mutable. The documentation often refers to these as “read-write bytes-like objects”. Example mutable buffer objects include "bytearray" and a "memoryview" of a "bytearray". Other operations require the binary data to be stored in immutable objects (“read-only bytes-like objects”); examples of these include "bytes" and a "memoryview" of a "bytes" object. bytecode Python source code is compiled into bytecode, the internal representation of a Python program in the CPython interpreter. The bytecode is also cached in ".pyc" files so that executing the same file is faster the second time (recompilation from source to bytecode can be avoided). This “intermediate language” is said to run on a *virtual machine* that executes the machine code corresponding to each bytecode. Do note that bytecodes are not expected to work between different Python virtual machines, nor to be stable between Python releases. A list of bytecode instructions can be found in the documentation for the dis module. callable A callable is an object that can be called, possibly with a set of arguments (see *argument*), with the following syntax: callable(argument1, argument2, argumentN) A *function*, and by extension a *method*, is a callable. An instance of a class that implements the "__call__()" method is also a callable. callback A subroutine function which is passed as an argument to be executed at some point in the future. class A template for creating user-defined objects. Class definitions normally contain method definitions which operate on instances of the class. class variable A variable defined in a class and intended to be modified only at class level (i.e., not in an instance of the class). closure variable A *free variable* referenced from a *nested scope* that is defined in an outer scope rather than being resolved at runtime from the globals or builtin namespaces. May be explicitly defined with the "nonlocal" keyword to allow write access, or implicitly defined if the variable is only being read. For example, in the "inner" function in the following code, both "x" and "print" are *free variables*, but only "x" is a *closure variable*: def outer(): x = 0 def inner(): nonlocal x x += 1 print(x) return inner Due to the "codeobject.co_freevars" attribute (which, despite its name, only includes the names of closure variables rather than listing all referenced free variables), the more general *free variable* term is sometimes used even when the intended meaning is to refer specifically to closure variables. complex number An extension of the familiar real number system in which all numbers are expressed as a sum of a real part and an imaginary part. Imaginary numbers are real multiples of the imaginary unit (the square root of "-1"), often written "i" in mathematics or "j" in engineering. Python has built-in support for complex numbers, which are written with this latter notation; the imaginary part is written with a "j" suffix, e.g., "3+1j". To get access to complex equivalents of the "math" module, use "cmath". Use of complex numbers is a fairly advanced mathematical feature. If you’re not aware of a need for them, it’s almost certain you can safely ignore them. context This term has different meanings depending on where and how it is used. Some common meanings: * The temporary state or environment established by a *context manager* via a "with" statement. * The collection of key­value bindings associated with a particular "contextvars.Context" object and accessed via "ContextVar" objects. Also see *context variable*. * A "contextvars.Context" object. Also see *current context*. context management protocol The "__enter__()" and "__exit__()" methods called by the "with" statement. See **PEP 343**. context manager An object which implements the *context management protocol* and controls the environment seen in a "with" statement. See **PEP 343**. context variable A variable whose value depends on which context is the *current context*. Values are accessed via "contextvars.ContextVar" objects. Context variables are primarily used to isolate state between concurrent asynchronous tasks. contiguous A buffer is considered contiguous exactly if it is either *C-contiguous* or *Fortran contiguous*. Zero-dimensional buffers are C and Fortran contiguous. In one-dimensional arrays, the items must be laid out in memory next to each other, in order of increasing indexes starting from zero. In multidimensional C-contiguous arrays, the last index varies the fastest when visiting items in order of memory address. However, in Fortran contiguous arrays, the first index varies the fastest. coroutine Coroutines are a more generalized form of subroutines. Subroutines are entered at one point and exited at another point. Coroutines can be entered, exited, and resumed at many different points. They can be implemented with the "async def" statement. See also **PEP 492**. coroutine function A function which returns a *coroutine* object. A coroutine function may be defined with the "async def" statement, and may contain "await", "async for", and "async with" keywords. These were introduced by **PEP 492**. CPython The canonical implementation of the Python programming language, as distributed on python.org. The term “CPython” is used when necessary to distinguish this implementation from others such as Jython or IronPython. current context The *context* ("contextvars.Context" object) that is currently used by "ContextVar" objects to access (get or set) the values of *context variables*. Each thread has its own current context. Frameworks for executing asynchronous tasks (see "asyncio") associate each task with a context which becomes the current context whenever the task starts or resumes execution. decorator A function returning another function, usually applied as a function transformation using the "@wrapper" syntax. Common examples for decorators are "classmethod()" and "staticmethod()". The decorator syntax is merely syntactic sugar, the following two function definitions are semantically equivalent: def f(arg): ... f = staticmethod(f) @staticmethod def f(arg): ... The same concept exists for classes, but is less commonly used there. See the documentation for function definitions and class definitions for more about decorators. descriptor Any object which defines the methods "__get__()", "__set__()", or "__delete__()". When a class attribute is a descriptor, its special binding behavior is triggered upon attribute lookup. Normally, using *a.b* to get, set or delete an attribute looks up the object named *b* in the class dictionary for *a*, but if *b* is a descriptor, the respective descriptor method gets called. Understanding descriptors is a key to a deep understanding of Python because they are the basis for many features including functions, methods, properties, class methods, static methods, and reference to super classes. For more information about descriptors’ methods, see Implementing Descriptors or the Descriptor How To Guide. dictionary An associative array, where arbitrary keys are mapped to values. The keys can be any object with "__hash__()" and "__eq__()" methods. Called a hash in Perl. dictionary comprehension A compact way to process all or part of the elements in an iterable and return a dictionary with the results. "results = {n: n ** 2 for n in range(10)}" generates a dictionary containing key "n" mapped to value "n ** 2". See Displays for lists, sets and dictionaries. dictionary view The objects returned from "dict.keys()", "dict.values()", and "dict.items()" are called dictionary views. They provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes, the view reflects these changes. To force the dictionary view to become a full list use "list(dictview)". See Dictionary view objects. docstring A string literal which appears as the first expression in a class, function or module. While ignored when the suite is executed, it is recognized by the compiler and put into the "__doc__" attribute of the enclosing class, function or module. Since it is available via introspection, it is the canonical place for documentation of the object. duck-typing A programming style which does not look at an object’s type to determine if it has the right interface; instead, the method or attribute is simply called or used (“If it looks like a duck and quacks like a duck, it must be a duck.”) By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution. Duck-typing avoids tests using "type()" or "isinstance()". (Note, however, that duck-typing can be complemented with *abstract base classes*.) Instead, it typically employs "hasattr()" tests or *EAFP* programming. EAFP Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many "try" and "except" statements. The technique contrasts with the *LBYL* style common to many other languages such as C. expression A piece of syntax which can be evaluated to some value. In other words, an expression is an accumulation of expression elements like literals, names, attribute access, operators or function calls which all return a value. In contrast to many other languages, not all language constructs are expressions. There are also *statement*s which cannot be used as expressions, such as "while". Assignments are also statements, not expressions. extension module A module written in C or C++, using Python’s C API to interact with the core and with user code. f-string String literals prefixed with "'f'" or "'F'" are commonly called “f-strings” which is short for formatted string literals. See also **PEP 498**. file object An object exposing a file-oriented API (with methods such as "read()" or "write()") to an underlying resource. Depending on the way it was created, a file object can mediate access to a real on- disk file or to another type of storage or communication device (for example standard input/output, in-memory buffers, sockets, pipes, etc.). File objects are also called *file-like objects* or *streams*. There are actually three categories of file objects: raw *binary files*, buffered *binary files* and *text files*. Their interfaces are defined in the "io" module. The canonical way to create a file object is by using the "open()" function. file-like object A synonym for *file object*. filesystem encoding and error handler Encoding and error handler used by Python to decode bytes from the operating system and encode Unicode to the operating system. The filesystem encoding must guarantee to successfully decode all bytes below 128. If the file system encoding fails to provide this guarantee, API functions can raise "UnicodeError". The "sys.getfilesystemencoding()" and "sys.getfilesystemencodeerrors()" functions can be used to get the filesystem encoding and error handler. The *filesystem encoding and error handler* are configured at Python startup by the "PyConfig_Read()" function: see "filesystem_encoding" and "filesystem_errors" members of "PyConfig". See also the *locale encoding*. finder An object that tries to find the *loader* for a module that is being imported. There are two types of finder: *meta path finders* for use with "sys.meta_path", and *path entry finders* for use with "sys.path_hooks". See Finders and loaders and "importlib" for much more detail. floor division Mathematical division that rounds down to nearest integer. The floor division operator is "//". For example, the expression "11 // 4" evaluates to "2" in contrast to the "2.75" returned by float true division. Note that "(-11) // 4" is "-3" because that is "-2.75" rounded *downward*. See **PEP 238**. free threading A threading model where multiple threads can run Python bytecode simultaneously within the same interpreter. This is in contrast to the *global interpreter lock* which allows only one thread to execute Python bytecode at a time. See **PEP 703**. free variable Formally, as defined in the language execution model, a free variable is any variable used in a namespace which is not a local variable in that namespace. See *closure variable* for an example. Pragmatically, due to the name of the "codeobject.co_freevars" attribute, the term is also sometimes used as a synonym for *closure variable*. function A series of statements which returns some value to a caller. It can also be passed zero or more *arguments* which may be used in the execution of the body. See also *parameter*, *method*, and the Function definitions section. function annotation An *annotation* of a function parameter or return value. Function annotations are usually used for *type hints*: for example, this function is expected to take two "int" arguments and is also expected to have an "int" return value: def sum_two_numbers(a: int, b: int) -> int: return a + b Function annotation syntax is explained in section Function definitions. See *variable annotation* and **PEP 484**, which describe this functionality. Also see Annotations Best Practices for best practices on working with annotations. __future__ A future statement, "from __future__ import ", directs the compiler to compile the current module using syntax or semantics that will become standard in a future release of Python. The "__future__" module documents the possible values of *feature*. By importing this module and evaluating its variables, you can see when a new feature was first added to the language and when it will (or did) become the default: >>> import __future__ >>> __future__.division _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192) garbage collection The process of freeing memory when it is not used anymore. Python performs garbage collection via reference counting and a cyclic garbage collector that is able to detect and break reference cycles. The garbage collector can be controlled using the "gc" module. generator A function which returns a *generator iterator*. It looks like a normal function except that it contains "yield" expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the "next()" function. Usually refers to a generator function, but may refer to a *generator iterator* in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity. generator iterator An object created by a *generator* function. Each "yield" temporarily suspends processing, remembering the execution state (including local variables and pending try- statements). When the *generator iterator* resumes, it picks up where it left off (in contrast to functions which start fresh on every invocation). generator expression An *expression* that returns an *iterator*. It looks like a normal expression followed by a "for" clause defining a loop variable, range, and an optional "if" clause. The combined expression generates values for an enclosing function: >>> sum(i*i for i in range(10)) # sum of squares 0, 1, 4, ... 81 285 generic function A function composed of multiple functions implementing the same operation for different types. Which implementation should be used during a call is determined by the dispatch algorithm. See also the *single dispatch* glossary entry, the "functools.singledispatch()" decorator, and **PEP 443**. generic type A *type* that can be parameterized; typically a container class such as "list" or "dict". Used for *type hints* and *annotations*. For more details, see generic alias types, **PEP 483**, **PEP 484**, **PEP 585**, and the "typing" module. GIL See *global interpreter lock*. global interpreter lock The mechanism used by the *CPython* interpreter to assure that only one thread executes Python *bytecode* at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as "dict") implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines. However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O. As of Python 3.13, the GIL can be disabled using the "--disable- gil" build configuration. After building Python with this option, code must be run with "-X gil=0" or after setting the "PYTHON_GIL=0" environment variable. This feature enables improved performance for multi-threaded applications and makes it easier to use multi-core CPUs efficiently. For more details, see **PEP 703**. hash-based pyc A bytecode cache file that uses the hash rather than the last- modified time of the corresponding source file to determine its validity. See Cached bytecode invalidation. hashable An object is *hashable* if it has a hash value which never changes during its lifetime (it needs a "__hash__()" method), and can be compared to other objects (it needs an "__eq__()" method). Hashable objects which compare equal must have the same hash value. Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally. Most of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not; immutable containers (such as tuples and frozensets) are only hashable if their elements are hashable. Objects which are instances of user- defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their "id()". IDLE An Integrated Development and Learning Environment for Python. IDLE — Python editor and shell is a basic editor and interpreter environment which ships with the standard distribution of Python. immortal *Immortal objects* are a CPython implementation detail introduced in **PEP 683**. If an object is immortal, its *reference count* is never modified, and therefore it is never deallocated while the interpreter is running. For example, "True" and "None" are immortal in CPython. immutable An object with a fixed value. Immutable objects include numbers, strings and tuples. Such an object cannot be altered. A new object has to be created if a different value has to be stored. They play an important role in places where a constant hash value is needed, for example as a key in a dictionary. import path A list of locations (or *path entries*) that are searched by the *path based finder* for modules to import. During import, this list of locations usually comes from "sys.path", but for subpackages it may also come from the parent package’s "__path__" attribute. importing The process by which Python code in one module is made available to Python code in another module. importer An object that both finds and loads a module; both a *finder* and *loader* object. interactive Python has an interactive interpreter which means you can enter statements and expressions at the interpreter prompt, immediately execute them and see their results. Just launch "python" with no arguments (possibly by selecting it from your computer’s main menu). It is a very powerful way to test out new ideas or inspect modules and packages (remember "help(x)"). For more on interactive mode, see Interactive Mode. interpreted Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run. Interpreted languages typically have a shorter development/debug cycle than compiled ones, though their programs generally also run more slowly. See also *interactive*. interpreter shutdown When asked to shut down, the Python interpreter enters a special phase where it gradually releases all allocated resources, such as modules and various critical internal structures. It also makes several calls to the *garbage collector*. This can trigger the execution of code in user-defined destructors or weakref callbacks. Code executed during the shutdown phase can encounter various exceptions as the resources it relies on may not function anymore (common examples are library modules or the warnings machinery). The main reason for interpreter shutdown is that the "__main__" module or the script being run has finished executing. iterable An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as "list", "str", and "tuple") and some non-sequence types like "dict", *file objects*, and objects of any classes you define with an "__iter__()" method or with a "__getitem__()" method that implements *sequence* semantics. Iterables can be used in a "for" loop and in many other places where a sequence is needed ("zip()", "map()", …). When an iterable object is passed as an argument to the built-in function "iter()", it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call "iter()" or deal with iterator objects yourself. The "for" statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also *iterator*, *sequence*, and *generator*. iterator An object representing a stream of data. Repeated calls to the iterator’s "__next__()" method (or passing it to the built-in function "next()") return successive items in the stream. When no more data are available a "StopIteration" exception is raised instead. At this point, the iterator object is exhausted and any further calls to its "__next__()" method just raise "StopIteration" again. Iterators are required to have an "__iter__()" method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a "list") produces a fresh new iterator each time you pass it to the "iter()" function or use it in a "for" loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container. More information can be found in Iterator Types. **CPython implementation detail:** CPython does not consistently apply the requirement that an iterator define "__iter__()". And also please note that the free-threading CPython does not guarantee the thread-safety of iterator operations. key function A key function or collation function is a callable that returns a value used for sorting or ordering. For example, "locale.strxfrm()" is used to produce a sort key that is aware of locale specific sort conventions. A number of tools in Python accept key functions to control how elements are ordered or grouped. They include "min()", "max()", "sorted()", "list.sort()", "heapq.merge()", "heapq.nsmallest()", "heapq.nlargest()", and "itertools.groupby()". There are several ways to create a key function. For example. the "str.lower()" method can serve as a key function for case insensitive sorts. Alternatively, a key function can be built from a "lambda" expression such as "lambda r: (r[0], r[2])". Also, "operator.attrgetter()", "operator.itemgetter()", and "operator.methodcaller()" are three key function constructors. See the Sorting HOW TO for examples of how to create and use key functions. keyword argument See *argument*. lambda An anonymous inline function consisting of a single *expression* which is evaluated when the function is called. The syntax to create a lambda function is "lambda [parameters]: expression" LBYL Look before you leap. This coding style explicitly tests for pre- conditions before making calls or lookups. This style contrasts with the *EAFP* approach and is characterized by the presence of many "if" statements. In a multi-threaded environment, the LBYL approach can risk introducing a race condition between “the looking” and “the leaping”. For example, the code, "if key in mapping: return mapping[key]" can fail if another thread removes *key* from *mapping* after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach. lexical analyzer Formal name for the *tokenizer*; see *token*. list A built-in Python *sequence*. Despite its name it is more akin to an array in other languages than to a linked list since access to elements is *O*(1). list comprehension A compact way to process all or part of the elements in a sequence and return a list with the results. "result = ['{:#04x}'.format(x) for x in range(256) if x % 2 == 0]" generates a list of strings containing even hex numbers (0x..) in the range from 0 to 255. The "if" clause is optional. If omitted, all elements in "range(256)" are processed. loader An object that loads a module. It must define the "exec_module()" and "create_module()" methods to implement the "Loader" interface. A loader is typically returned by a *finder*. See also: * Finders and loaders * "importlib.abc.Loader" * **PEP 302** locale encoding On Unix, it is the encoding of the LC_CTYPE locale. It can be set with "locale.setlocale(locale.LC_CTYPE, new_locale)". On Windows, it is the ANSI code page (ex: ""cp1252""). On Android and VxWorks, Python uses ""utf-8"" as the locale encoding. "locale.getencoding()" can be used to get the locale encoding. See also the *filesystem encoding and error handler*. magic method An informal synonym for *special method*. mapping A container object that supports arbitrary key lookups and implements the methods specified in the "collections.abc.Mapping" or "collections.abc.MutableMapping" abstract base classes. Examples include "dict", "collections.defaultdict", "collections.OrderedDict" and "collections.Counter". meta path finder A *finder* returned by a search of "sys.meta_path". Meta path finders are related to, but different from *path entry finders*. See "importlib.abc.MetaPathFinder" for the methods that meta path finders implement. metaclass The class of a class. Class definitions create a class name, a class dictionary, and a list of base classes. The metaclass is responsible for taking those three arguments and creating the class. Most object oriented programming languages provide a default implementation. What makes Python special is that it is possible to create custom metaclasses. Most users never need this tool, but when the need arises, metaclasses can provide powerful, elegant solutions. They have been used for logging attribute access, adding thread-safety, tracking object creation, implementing singletons, and many other tasks. More information can be found in Metaclasses. method A function which is defined inside a class body. If called as an attribute of an instance of that class, the method will get the instance object as its first *argument* (which is usually called "self"). See *function* and *nested scope*. method resolution order Method Resolution Order is the order in which base classes are searched for a member during lookup. See The Python 2.3 Method Resolution Order for details of the algorithm used by the Python interpreter since the 2.3 release. module An object that serves as an organizational unit of Python code. Modules have a namespace containing arbitrary Python objects. Modules are loaded into Python by the process of *importing*. See also *package*. module spec A namespace containing the import-related information used to load a module. An instance of "importlib.machinery.ModuleSpec". See also Module specs. MRO See *method resolution order*. mutable Mutable objects can change their value but keep their "id()". See also *immutable*. named tuple The term “named tuple” applies to any type or class that inherits from tuple and whose indexable elements are also accessible using named attributes. The type or class may have other features as well. Several built-in types are named tuples, including the values returned by "time.localtime()" and "os.stat()". Another example is "sys.float_info": >>> sys.float_info[1] # indexed access 1024 >>> sys.float_info.max_exp # named field access 1024 >>> isinstance(sys.float_info, tuple) # kind of tuple True Some named tuples are built-in types (such as the above examples). Alternatively, a named tuple can be created from a regular class definition that inherits from "tuple" and that defines named fields. Such a class can be written by hand, or it can be created by inheriting "typing.NamedTuple", or with the factory function "collections.namedtuple()". The latter techniques also add some extra methods that may not be found in hand-written or built-in named tuples. namespace The place where a variable is stored. Namespaces are implemented as dictionaries. There are the local, global and built-in namespaces as well as nested namespaces in objects (in methods). Namespaces support modularity by preventing naming conflicts. For instance, the functions "builtins.open" and "os.open()" are distinguished by their namespaces. Namespaces also aid readability and maintainability by making it clear which module implements a function. For instance, writing "random.seed()" or "itertools.islice()" makes it clear that those functions are implemented by the "random" and "itertools" modules, respectively. namespace package A *package* which serves only as a container for subpackages. Namespace packages may have no physical representation, and specifically are not like a *regular package* because they have no "__init__.py" file. Namespace packages allow several individually installable packages to have a common parent package. Otherwise, it is recommended to use a *regular package*. For more information, see **PEP 420** and Namespace packages. See also *module*. nested scope The ability to refer to a variable in an enclosing definition. For instance, a function defined inside another function can refer to variables in the outer function. Note that nested scopes by default work only for reference and not for assignment. Local variables both read and write in the innermost scope. Likewise, global variables read and write to the global namespace. The "nonlocal" allows writing to outer scopes. new-style class Old name for the flavor of classes now used for all class objects. In earlier Python versions, only new-style classes could use Python’s newer, versatile features like "__slots__", descriptors, properties, "__getattribute__()", class methods, and static methods. object Any data with state (attributes or value) and defined behavior (methods). Also the ultimate base class of any *new-style class*. optimized scope A scope where target local variable names are reliably known to the compiler when the code is compiled, allowing optimization of read and write access to these names. The local namespaces for functions, generators, coroutines, comprehensions, and generator expressions are optimized in this fashion. Note: most interpreter optimizations are applied to all scopes, only those relying on a known set of local and nonlocal variable names are restricted to optimized scopes. package A Python *module* which can contain submodules or recursively, subpackages. Technically, a package is a Python module with a "__path__" attribute. See also *regular package* and *namespace package*. parameter A named entity in a *function* (or method) definition that specifies an *argument* (or in some cases, arguments) that the function can accept. There are five kinds of parameter: * *positional-or-keyword*: specifies an argument that can be passed either *positionally* or as a *keyword argument*. This is the default kind of parameter, for example *foo* and *bar* in the following: def func(foo, bar=None): ... * *positional-only*: specifies an argument that can be supplied only by position. Positional-only parameters can be defined by including a "/" character in the parameter list of the function definition after them, for example *posonly1* and *posonly2* in the following: def func(posonly1, posonly2, /, positional_or_keyword): ... * *keyword-only*: specifies an argument that can be supplied only by keyword. Keyword-only parameters can be defined by including a single var-positional parameter or bare "*" in the parameter list of the function definition before them, for example *kw_only1* and *kw_only2* in the following: def func(arg, *, kw_only1, kw_only2): ... * *var-positional*: specifies that an arbitrary sequence of positional arguments can be provided (in addition to any positional arguments already accepted by other parameters). Such a parameter can be defined by prepending the parameter name with "*", for example *args* in the following: def func(*args, **kwargs): ... * *var-keyword*: specifies that arbitrarily many keyword arguments can be provided (in addition to any keyword arguments already accepted by other parameters). Such a parameter can be defined by prepending the parameter name with "**", for example *kwargs* in the example above. Parameters can specify both optional and required arguments, as well as default values for some optional arguments. See also the *argument* glossary entry, the FAQ question on the difference between arguments and parameters, the "inspect.Parameter" class, the Function definitions section, and **PEP 362**. path entry A single location on the *import path* which the *path based finder* consults to find modules for importing. path entry finder A *finder* returned by a callable on "sys.path_hooks" (i.e. a *path entry hook*) which knows how to locate modules given a *path entry*. See "importlib.abc.PathEntryFinder" for the methods that path entry finders implement. path entry hook A callable on the "sys.path_hooks" list which returns a *path entry finder* if it knows how to find modules on a specific *path entry*. path based finder One of the default *meta path finders* which searches an *import path* for modules. path-like object An object representing a file system path. A path-like object is either a "str" or "bytes" object representing a path, or an object implementing the "os.PathLike" protocol. An object that supports the "os.PathLike" protocol can be converted to a "str" or "bytes" file system path by calling the "os.fspath()" function; "os.fsdecode()" and "os.fsencode()" can be used to guarantee a "str" or "bytes" result instead, respectively. Introduced by **PEP 519**. PEP Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment. PEPs should provide a concise technical specification and a rationale for proposed features. PEPs are intended to be the primary mechanisms for proposing major new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. See **PEP 1**. portion A set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package, as defined in **PEP 420**. positional argument See *argument*. provisional API A provisional API is one which has been deliberately excluded from the standard library’s backwards compatibility guarantees. While major changes to such interfaces are not expected, as long as they are marked provisional, backwards incompatible changes (up to and including removal of the interface) may occur if deemed necessary by core developers. Such changes will not be made gratuitously – they will occur only if serious fundamental flaws are uncovered that were missed prior to the inclusion of the API. Even for provisional APIs, backwards incompatible changes are seen as a “solution of last resort” - every attempt will still be made to find a backwards compatible resolution to any identified problems. This process allows the standard library to continue to evolve over time, without locking in problematic design errors for extended periods of time. See **PEP 411** for more details. provisional package See *provisional API*. Python 3000 Nickname for the Python 3.x release line (coined long ago when the release of version 3 was something in the distant future.) This is also abbreviated “Py3k”. Pythonic An idea or piece of code which closely follows the most common idioms of the Python language, rather than implementing code using concepts common to other languages. For example, a common idiom in Python is to loop over all elements of an iterable using a "for" statement. Many other languages don’t have this type of construct, so people unfamiliar with Python sometimes use a numerical counter instead: for i in range(len(food)): print(food[i]) As opposed to the cleaner, Pythonic method: for piece in food: print(piece) qualified name A dotted name showing the “path” from a module’s global scope to a class, function or method defined in that module, as defined in **PEP 3155**. For top-level functions and classes, the qualified name is the same as the object’s name: >>> class C: ... class D: ... def meth(self): ... pass ... >>> C.__qualname__ 'C' >>> C.D.__qualname__ 'C.D' >>> C.D.meth.__qualname__ 'C.D.meth' When used to refer to modules, the *fully qualified name* means the entire dotted path to the module, including any parent packages, e.g. "email.mime.text": >>> import email.mime.text >>> email.mime.text.__name__ 'email.mime.text' reference count The number of references to an object. When the reference count of an object drops to zero, it is deallocated. Some objects are *immortal* and have reference counts that are never modified, and therefore the objects are never deallocated. Reference counting is generally not visible to Python code, but it is a key element of the *CPython* implementation. Programmers can call the "sys.getrefcount()" function to return the reference count for a particular object. regular package A traditional *package*, such as a directory containing an "__init__.py" file. See also *namespace package*. REPL An acronym for the “read–eval–print loop”, another name for the *interactive* interpreter shell. __slots__ A declaration inside a class that saves memory by pre-declaring space for instance attributes and eliminating instance dictionaries. Though popular, the technique is somewhat tricky to get right and is best reserved for rare cases where there are large numbers of instances in a memory-critical application. sequence An *iterable* which supports efficient element access using integer indices via the "__getitem__()" special method and defines a "__len__()" method that returns the length of the sequence. Some built-in sequence types are "list", "str", "tuple", and "bytes". Note that "dict" also supports "__getitem__()" and "__len__()", but is considered a mapping rather than a sequence because the lookups use arbitrary *hashable* keys rather than integers. The "collections.abc.Sequence" abstract base class defines a much richer interface that goes beyond just "__getitem__()" and "__len__()", adding "count()", "index()", "__contains__()", and "__reversed__()". Types that implement this expanded interface can be registered explicitly using "register()". For more documentation on sequence methods generally, see Common Sequence Operations. set comprehension A compact way to process all or part of the elements in an iterable and return a set with the results. "results = {c for c in 'abracadabra' if c not in 'abc'}" generates the set of strings "{'r', 'd'}". See Displays for lists, sets and dictionaries. single dispatch A form of *generic function* dispatch where the implementation is chosen based on the type of a single argument. slice An object usually containing a portion of a *sequence*. A slice is created using the subscript notation, "[]" with colons between numbers when several are given, such as in "variable_name[1:3:5]". The bracket (subscript) notation uses "slice" objects internally. soft deprecated A soft deprecated API should not be used in new code, but it is safe for already existing code to use it. The API remains documented and tested, but will not be enhanced further. Soft deprecation, unlike normal deprecation, does not plan on removing the API and will not emit warnings. See PEP 387: Soft Deprecation. special method A method that is called implicitly by Python to execute a certain operation on a type, such as addition. Such methods have names starting and ending with double underscores. Special methods are documented in Special method names. statement A statement is part of a suite (a “block” of code). A statement is either an *expression* or one of several constructs with a keyword, such as "if", "while" or "for". static type checker An external tool that reads Python code and analyzes it, looking for issues such as incorrect types. See also *type hints* and the "typing" module. strong reference In Python’s C API, a strong reference is a reference to an object which is owned by the code holding the reference. The strong reference is taken by calling "Py_INCREF()" when the reference is created and released with "Py_DECREF()" when the reference is deleted. The "Py_NewRef()" function can be used to create a strong reference to an object. Usually, the "Py_DECREF()" function must be called on the strong reference before exiting the scope of the strong reference, to avoid leaking one reference. See also *borrowed reference*. text encoding A string in Python is a sequence of Unicode code points (in range "U+0000"–"U+10FFFF"). To store or transfer a string, it needs to be serialized as a sequence of bytes. Serializing a string into a sequence of bytes is known as “encoding”, and recreating the string from the sequence of bytes is known as “decoding”. There are a variety of different text serialization codecs, which are collectively referred to as “text encodings”. text file A *file object* able to read and write "str" objects. Often, a text file actually accesses a byte-oriented datastream and handles the *text encoding* automatically. Examples of text files are files opened in text mode ("'r'" or "'w'"), "sys.stdin", "sys.stdout", and instances of "io.StringIO". See also *binary file* for a file object able to read and write *bytes-like objects*. token A small unit of source code, generated by the lexical analyzer (also called the *tokenizer*). Names, numbers, strings, operators, newlines and similar are represented by tokens. The "tokenize" module exposes Python’s lexical analyzer. The "token" module contains information on the various types of tokens. triple-quoted string A string which is bound by three instances of either a quotation mark (”) or an apostrophe (‘). While they don’t provide any functionality not available with single-quoted strings, they are useful for a number of reasons. They allow you to include unescaped single and double quotes within a string and they can span multiple lines without the use of the continuation character, making them especially useful when writing docstrings. type The type of a Python object determines what kind of object it is; every object has a type. An object’s type is accessible as its "__class__" attribute or can be retrieved with "type(obj)". type alias A synonym for a type, created by assigning the type to an identifier. Type aliases are useful for simplifying *type hints*. For example: def remove_gray_shades( colors: list[tuple[int, int, int]]) -> list[tuple[int, int, int]]: pass could be made more readable like this: Color = tuple[int, int, int] def remove_gray_shades(colors: list[Color]) -> list[Color]: pass See "typing" and **PEP 484**, which describe this functionality. type hint An *annotation* that specifies the expected type for a variable, a class attribute, or a function parameter or return value. Type hints are optional and are not enforced by Python but they are useful to *static type checkers*. They can also aid IDEs with code completion and refactoring. Type hints of global variables, class attributes, and functions, but not local variables, can be accessed using "typing.get_type_hints()". See "typing" and **PEP 484**, which describe this functionality. universal newlines A manner of interpreting text streams in which all of the following are recognized as ending a line: the Unix end-of-line convention "'\n'", the Windows convention "'\r\n'", and the old Macintosh convention "'\r'". See **PEP 278** and **PEP 3116**, as well as "bytes.splitlines()" for an additional use. variable annotation An *annotation* of a variable or a class attribute. When annotating a variable or a class attribute, assignment is optional: class C: field: 'annotation' Variable annotations are usually used for *type hints*: for example this variable is expected to take "int" values: count: int = 0 Variable annotation syntax is explained in section Annotated assignment statements. See *function annotation*, **PEP 484** and **PEP 526**, which describe this functionality. Also see Annotations Best Practices for best practices on working with annotations. virtual environment A cooperatively isolated runtime environment that allows Python users and applications to install and upgrade Python distribution packages without interfering with the behaviour of other Python applications running on the same system. See also "venv". virtual machine A computer defined entirely in software. Python’s virtual machine executes the *bytecode* emitted by the bytecode compiler. Zen of Python Listing of Python design principles and philosophies that are helpful in understanding and using the language. The listing can be found by typing “"import this"” at the interactive prompt. Annotations Best Practices ************************** author: Larry Hastings Abstract ^^^^^^^^ This document is designed to encapsulate the best practices for working with annotations dicts. If you write Python code that examines "__annotations__" on Python objects, we encourage you to follow the guidelines described below. The document is organized into four sections: best practices for accessing the annotations of an object in Python versions 3.10 and newer, best practices for accessing the annotations of an object in Python versions 3.9 and older, other best practices for "__annotations__" that apply to any Python version, and quirks of "__annotations__". Note that this document is specifically about working with "__annotations__", not uses *for* annotations. If you’re looking for information on how to use “type hints” in your code, please see the "typing" module. Accessing The Annotations Dict Of An Object In Python 3.10 And Newer ==================================================================== Python 3.10 adds a new function to the standard library: "inspect.get_annotations()". In Python versions 3.10 and newer, calling this function is the best practice for accessing the annotations dict of any object that supports annotations. This function can also “un-stringize” stringized annotations for you. If for some reason "inspect.get_annotations()" isn’t viable for your use case, you may access the "__annotations__" data member manually. Best practice for this changed in Python 3.10 as well: as of Python 3.10, "o.__annotations__" is guaranteed to *always* work on Python functions, classes, and modules. If you’re certain the object you’re examining is one of these three *specific* objects, you may simply use "o.__annotations__" to get at the object’s annotations dict. However, other types of callables–for example, callables created by "functools.partial()"–may not have an "__annotations__" attribute defined. When accessing the "__annotations__" of a possibly unknown object, best practice in Python versions 3.10 and newer is to call "getattr()" with three arguments, for example "getattr(o, '__annotations__', None)". Before Python 3.10, accessing "__annotations__" on a class that defines no annotations but that has a parent class with annotations would return the parent’s "__annotations__". In Python 3.10 and newer, the child class’s annotations will be an empty dict instead. Accessing The Annotations Dict Of An Object In Python 3.9 And Older =================================================================== In Python 3.9 and older, accessing the annotations dict of an object is much more complicated than in newer versions. The problem is a design flaw in these older versions of Python, specifically to do with class annotations. Best practice for accessing the annotations dict of other objects–functions, other callables, and modules–is the same as best practice for 3.10, assuming you aren’t calling "inspect.get_annotations()": you should use three-argument "getattr()" to access the object’s "__annotations__" attribute. Unfortunately, this isn’t best practice for classes. The problem is that, since "__annotations__" is optional on classes, and because classes can inherit attributes from their base classes, accessing the "__annotations__" attribute of a class may inadvertently return the annotations dict of a *base class.* As an example: class Base: a: int = 3 b: str = 'abc' class Derived(Base): pass print(Derived.__annotations__) This will print the annotations dict from "Base", not "Derived". Your code will have to have a separate code path if the object you’re examining is a class ("isinstance(o, type)"). In that case, best practice relies on an implementation detail of Python 3.9 and before: if a class has annotations defined, they are stored in the class’s "__dict__" dictionary. Since the class may or may not have annotations defined, best practice is to call the "get()" method on the class dict. To put it all together, here is some sample code that safely accesses the "__annotations__" attribute on an arbitrary object in Python 3.9 and before: if isinstance(o, type): ann = o.__dict__.get('__annotations__', None) else: ann = getattr(o, '__annotations__', None) After running this code, "ann" should be either a dictionary or "None". You’re encouraged to double-check the type of "ann" using "isinstance()" before further examination. Note that some exotic or malformed type objects may not have a "__dict__" attribute, so for extra safety you may also wish to use "getattr()" to access "__dict__". Manually Un-Stringizing Stringized Annotations ============================================== In situations where some annotations may be “stringized”, and you wish to evaluate those strings to produce the Python values they represent, it really is best to call "inspect.get_annotations()" to do this work for you. If you’re using Python 3.9 or older, or if for some reason you can’t use "inspect.get_annotations()", you’ll need to duplicate its logic. You’re encouraged to examine the implementation of "inspect.get_annotations()" in the current Python version and follow a similar approach. In a nutshell, if you wish to evaluate a stringized annotation on an arbitrary object "o": * If "o" is a module, use "o.__dict__" as the "globals" when calling "eval()". * If "o" is a class, use "sys.modules[o.__module__].__dict__" as the "globals", and "dict(vars(o))" as the "locals", when calling "eval()". * If "o" is a wrapped callable using "functools.update_wrapper()", "functools.wraps()", or "functools.partial()", iteratively unwrap it by accessing either "o.__wrapped__" or "o.func" as appropriate, until you have found the root unwrapped function. * If "o" is a callable (but not a class), use "o.__globals__" as the globals when calling "eval()". However, not all string values used as annotations can be successfully turned into Python values by "eval()". String values could theoretically contain any valid string, and in practice there are valid use cases for type hints that require annotating with string values that specifically *can’t* be evaluated. For example: * **PEP 604** union types using "|", before support for this was added to Python 3.10. * Definitions that aren’t needed at runtime, only imported when "typing.TYPE_CHECKING" is true. If "eval()" attempts to evaluate such values, it will fail and raise an exception. So, when designing a library API that works with annotations, it’s recommended to only attempt to evaluate string values when explicitly requested to by the caller. Best Practices For "__annotations__" In Any Python Version ========================================================== * You should avoid assigning to the "__annotations__" member of objects directly. Let Python manage setting "__annotations__". * If you do assign directly to the "__annotations__" member of an object, you should always set it to a "dict" object. * If you directly access the "__annotations__" member of an object, you should ensure that it’s a dictionary before attempting to examine its contents. * You should avoid modifying "__annotations__" dicts. * You should avoid deleting the "__annotations__" attribute of an object. "__annotations__" Quirks ======================== In all versions of Python 3, function objects lazy-create an annotations dict if no annotations are defined on that object. You can delete the "__annotations__" attribute using "del fn.__annotations__", but if you then access "fn.__annotations__" the object will create a new empty dict that it will store and return as its annotations. Deleting the annotations on a function before it has lazily created its annotations dict will throw an "AttributeError"; using "del fn.__annotations__" twice in a row is guaranteed to always throw an "AttributeError". Everything in the above paragraph also applies to class and module objects in Python 3.10 and newer. In all versions of Python 3, you can set "__annotations__" on a function object to "None". However, subsequently accessing the annotations on that object using "fn.__annotations__" will lazy-create an empty dictionary as per the first paragraph of this section. This is *not* true of modules and classes, in any Python version; those objects permit setting "__annotations__" to any Python value, and will retain whatever value is set. If Python stringizes your annotations for you (using "from __future__ import annotations"), and you specify a string as an annotation, the string will itself be quoted. In effect the annotation is quoted *twice.* For example: from __future__ import annotations def foo(a: "str"): pass print(foo.__annotations__) This prints "{'a': "'str'"}". This shouldn’t really be considered a “quirk”; it’s mentioned here simply because it might be surprising. Migrating "optparse" code to "argparse" *************************************** The "argparse" module offers several higher level features not natively provided by the "optparse" module, including: * Handling positional arguments. * Supporting subcommands. * Allowing alternative option prefixes like "+" and "/". * Handling zero-or-more and one-or-more style arguments. * Producing more informative usage messages. * Providing a much simpler interface for custom "type" and "action". Originally, the "argparse" module attempted to maintain compatibility with "optparse". However, the fundamental design differences between supporting declarative command line option processing (while leaving positional argument processing to application code), and supporting both named options and positional arguments in the declarative interface mean that the API has diverged from that of "optparse" over time. As described in Choosing an argument parsing library, applications that are currently using "optparse" and are happy with the way it works can just continue to use "optparse". Application developers that are considering migrating should also review the list of intrinsic behavioural differences described in that section before deciding whether or not migration is desirable. For applications that do choose to migrate from "optparse" to "argparse", the following suggestions should be helpful: * Replace all "optparse.OptionParser.add_option()" calls with "ArgumentParser.add_argument()" calls. * Replace "(options, args) = parser.parse_args()" with "args = parser.parse_args()" and add additional "ArgumentParser.add_argument()" calls for the positional arguments. Keep in mind that what was previously called "options", now in the "argparse" context is called "args". * Replace "optparse.OptionParser.disable_interspersed_args()" by using "parse_intermixed_args()" instead of "parse_args()". * Replace callback actions and the "callback_*" keyword arguments with "type" or "action" arguments. * Replace string names for "type" keyword arguments with the corresponding type objects (e.g. int, float, complex, etc). * Replace "optparse.Values" with "Namespace" and "optparse.OptionError" and "optparse.OptionValueError" with "ArgumentError". * Replace strings with implicit arguments such as "%default" or "%prog" with the standard Python syntax to use dictionaries to format strings, that is, "%(default)s" and "%(prog)s". * Replace the OptionParser constructor "version" argument with a call to "parser.add_argument('--version', action='version', version='')". Argparse Tutorial ***************** author: Tshepang Mbambo This tutorial is intended to be a gentle introduction to "argparse", the recommended command-line parsing module in the Python standard library. Note: The standard library includes two other libraries directly related to command-line parameter processing: the lower level "optparse" module (which may require more code to configure for a given application, but also allows an application to request behaviors that "argparse" doesn’t support), and the very low level "getopt" (which specifically serves as an equivalent to the "getopt()" family of functions available to C programmers). While neither of those modules is covered directly in this guide, many of the core concepts in "argparse" first originated in "optparse", so some aspects of this tutorial will also be relevant to "optparse" users. Concepts ======== Let’s show the sort of functionality that we are going to explore in this introductory tutorial by making use of the **ls** command: $ ls cpython devguide prog.py pypy rm-unused-function.patch $ ls pypy ctypes_configure demo dotviewer include lib_pypy lib-python ... $ ls -l total 20 drwxr-xr-x 19 wena wena 4096 Feb 18 18:51 cpython drwxr-xr-x 4 wena wena 4096 Feb 8 12:04 devguide -rwxr-xr-x 1 wena wena 535 Feb 19 00:05 prog.py drwxr-xr-x 14 wena wena 4096 Feb 7 00:59 pypy -rw-r--r-- 1 wena wena 741 Feb 18 01:01 rm-unused-function.patch $ ls --help Usage: ls [OPTION]... [FILE]... List information about the FILEs (the current directory by default). Sort entries alphabetically if none of -cftuvSUX nor --sort is specified. ... A few concepts we can learn from the four commands: * The **ls** command is useful when run without any options at all. It defaults to displaying the contents of the current directory. * If we want beyond what it provides by default, we tell it a bit more. In this case, we want it to display a different directory, "pypy". What we did is specify what is known as a positional argument. It’s named so because the program should know what to do with the value, solely based on where it appears on the command line. This concept is more relevant to a command like **cp**, whose most basic usage is "cp SRC DEST". The first position is *what you want copied,* and the second position is *where you want it copied to*. * Now, say we want to change behaviour of the program. In our example, we display more info for each file instead of just showing the file names. The "-l" in that case is known as an optional argument. * That’s a snippet of the help text. It’s very useful in that you can come across a program you have never used before, and can figure out how it works simply by reading its help text. The basics ========== Let us start with a very simple example which does (almost) nothing: import argparse parser = argparse.ArgumentParser() parser.parse_args() Following is a result of running the code: $ python prog.py $ python prog.py --help usage: prog.py [-h] options: -h, --help show this help message and exit $ python prog.py --verbose usage: prog.py [-h] prog.py: error: unrecognized arguments: --verbose $ python prog.py foo usage: prog.py [-h] prog.py: error: unrecognized arguments: foo Here is what is happening: * Running the script without any options results in nothing displayed to stdout. Not so useful. * The second one starts to display the usefulness of the "argparse" module. We have done almost nothing, but already we get a nice help message. * The "--help" option, which can also be shortened to "-h", is the only option we get for free (i.e. no need to specify it). Specifying anything else results in an error. But even then, we do get a useful usage message, also for free. Introducing Positional arguments ================================ An example: import argparse parser = argparse.ArgumentParser() parser.add_argument("echo") args = parser.parse_args() print(args.echo) And running the code: $ python prog.py usage: prog.py [-h] echo prog.py: error: the following arguments are required: echo $ python prog.py --help usage: prog.py [-h] echo positional arguments: echo options: -h, --help show this help message and exit $ python prog.py foo foo Here is what’s happening: * We’ve added the "add_argument()" method, which is what we use to specify which command-line options the program is willing to accept. In this case, I’ve named it "echo" so that it’s in line with its function. * Calling our program now requires us to specify an option. * The "parse_args()" method actually returns some data from the options specified, in this case, "echo". * The variable is some form of ‘magic’ that "argparse" performs for free (i.e. no need to specify which variable that value is stored in). You will also notice that its name matches the string argument given to the method, "echo". Note however that, although the help display looks nice and all, it currently is not as helpful as it can be. For example we see that we got "echo" as a positional argument, but we don’t know what it does, other than by guessing or by reading the source code. So, let’s make it a bit more useful: import argparse parser = argparse.ArgumentParser() parser.add_argument("echo", help="echo the string you use here") args = parser.parse_args() print(args.echo) And we get: $ python prog.py -h usage: prog.py [-h] echo positional arguments: echo echo the string you use here options: -h, --help show this help message and exit Now, how about doing something even more useful: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", help="display a square of a given number") args = parser.parse_args() print(args.square**2) Following is a result of running the code: $ python prog.py 4 Traceback (most recent call last): File "prog.py", line 5, in print(args.square**2) TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int' That didn’t go so well. That’s because "argparse" treats the options we give it as strings, unless we tell it otherwise. So, let’s tell "argparse" to treat that input as an integer: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", help="display a square of a given number", type=int) args = parser.parse_args() print(args.square**2) Following is a result of running the code: $ python prog.py 4 16 $ python prog.py four usage: prog.py [-h] square prog.py: error: argument square: invalid int value: 'four' That went well. The program now even helpfully quits on bad illegal input before proceeding. Introducing Optional arguments ============================== So far we have been playing with positional arguments. Let us have a look on how to add optional ones: import argparse parser = argparse.ArgumentParser() parser.add_argument("--verbosity", help="increase output verbosity") args = parser.parse_args() if args.verbosity: print("verbosity turned on") And the output: $ python prog.py --verbosity 1 verbosity turned on $ python prog.py $ python prog.py --help usage: prog.py [-h] [--verbosity VERBOSITY] options: -h, --help show this help message and exit --verbosity VERBOSITY increase output verbosity $ python prog.py --verbosity usage: prog.py [-h] [--verbosity VERBOSITY] prog.py: error: argument --verbosity: expected one argument Here is what is happening: * The program is written so as to display something when "--verbosity" is specified and display nothing when not. * To show that the option is actually optional, there is no error when running the program without it. Note that by default, if an optional argument isn’t used, the relevant variable, in this case "args.verbosity", is given "None" as a value, which is the reason it fails the truth test of the "if" statement. * The help message is a bit different. * When using the "--verbosity" option, one must also specify some value, any value. The above example accepts arbitrary integer values for "--verbosity", but for our simple program, only two values are actually useful, "True" or "False". Let’s modify the code accordingly: import argparse parser = argparse.ArgumentParser() parser.add_argument("--verbose", help="increase output verbosity", action="store_true") args = parser.parse_args() if args.verbose: print("verbosity turned on") And the output: $ python prog.py --verbose verbosity turned on $ python prog.py --verbose 1 usage: prog.py [-h] [--verbose] prog.py: error: unrecognized arguments: 1 $ python prog.py --help usage: prog.py [-h] [--verbose] options: -h, --help show this help message and exit --verbose increase output verbosity Here is what is happening: * The option is now more of a flag than something that requires a value. We even changed the name of the option to match that idea. Note that we now specify a new keyword, "action", and give it the value ""store_true"". This means that, if the option is specified, assign the value "True" to "args.verbose". Not specifying it implies "False". * It complains when you specify a value, in true spirit of what flags actually are. * Notice the different help text. Short options ------------- If you are familiar with command line usage, you will notice that I haven’t yet touched on the topic of short versions of the options. It’s quite simple: import argparse parser = argparse.ArgumentParser() parser.add_argument("-v", "--verbose", help="increase output verbosity", action="store_true") args = parser.parse_args() if args.verbose: print("verbosity turned on") And here goes: $ python prog.py -v verbosity turned on $ python prog.py --help usage: prog.py [-h] [-v] options: -h, --help show this help message and exit -v, --verbose increase output verbosity Note that the new ability is also reflected in the help text. Combining Positional and Optional arguments =========================================== Our program keeps growing in complexity: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbose", action="store_true", help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbose: print(f"the square of {args.square} equals {answer}") else: print(answer) And now the output: $ python prog.py usage: prog.py [-h] [-v] square prog.py: error: the following arguments are required: square $ python prog.py 4 16 $ python prog.py 4 --verbose the square of 4 equals 16 $ python prog.py --verbose 4 the square of 4 equals 16 * We’ve brought back a positional argument, hence the complaint. * Note that the order does not matter. How about we give this program of ours back the ability to have multiple verbosity values, and actually get to use them: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", type=int, help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity == 2: print(f"the square of {args.square} equals {answer}") elif args.verbosity == 1: print(f"{args.square}^2 == {answer}") else: print(answer) And the output: $ python prog.py 4 16 $ python prog.py 4 -v usage: prog.py [-h] [-v VERBOSITY] square prog.py: error: argument -v/--verbosity: expected one argument $ python prog.py 4 -v 1 4^2 == 16 $ python prog.py 4 -v 2 the square of 4 equals 16 $ python prog.py 4 -v 3 16 These all look good except the last one, which exposes a bug in our program. Let’s fix it by restricting the values the "--verbosity" option can accept: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", type=int, choices=[0, 1, 2], help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity == 2: print(f"the square of {args.square} equals {answer}") elif args.verbosity == 1: print(f"{args.square}^2 == {answer}") else: print(answer) And the output: $ python prog.py 4 -v 3 usage: prog.py [-h] [-v {0,1,2}] square prog.py: error: argument -v/--verbosity: invalid choice: 3 (choose from 0, 1, 2) $ python prog.py 4 -h usage: prog.py [-h] [-v {0,1,2}] square positional arguments: square display a square of a given number options: -h, --help show this help message and exit -v, --verbosity {0,1,2} increase output verbosity Note that the change also reflects both in the error message as well as the help string. Now, let’s use a different approach of playing with verbosity, which is pretty common. It also matches the way the CPython executable handles its own verbosity argument (check the output of "python --help"): import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display the square of a given number") parser.add_argument("-v", "--verbosity", action="count", help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity == 2: print(f"the square of {args.square} equals {answer}") elif args.verbosity == 1: print(f"{args.square}^2 == {answer}") else: print(answer) We have introduced another action, “count”, to count the number of occurrences of specific options. $ python prog.py 4 16 $ python prog.py 4 -v 4^2 == 16 $ python prog.py 4 -vv the square of 4 equals 16 $ python prog.py 4 --verbosity --verbosity the square of 4 equals 16 $ python prog.py 4 -v 1 usage: prog.py [-h] [-v] square prog.py: error: unrecognized arguments: 1 $ python prog.py 4 -h usage: prog.py [-h] [-v] square positional arguments: square display a square of a given number options: -h, --help show this help message and exit -v, --verbosity increase output verbosity $ python prog.py 4 -vvv 16 * Yes, it’s now more of a flag (similar to "action="store_true"") in the previous version of our script. That should explain the complaint. * It also behaves similar to “store_true” action. * Now here’s a demonstration of what the “count” action gives. You’ve probably seen this sort of usage before. * And if you don’t specify the "-v" flag, that flag is considered to have "None" value. * As should be expected, specifying the long form of the flag, we should get the same output. * Sadly, our help output isn’t very informative on the new ability our script has acquired, but that can always be fixed by improving the documentation for our script (e.g. via the "help" keyword argument). * That last output exposes a bug in our program. Let’s fix: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", action="count", help="increase output verbosity") args = parser.parse_args() answer = args.square**2 # bugfix: replace == with >= if args.verbosity >= 2: print(f"the square of {args.square} equals {answer}") elif args.verbosity >= 1: print(f"{args.square}^2 == {answer}") else: print(answer) And this is what it gives: $ python prog.py 4 -vvv the square of 4 equals 16 $ python prog.py 4 -vvvv the square of 4 equals 16 $ python prog.py 4 Traceback (most recent call last): File "prog.py", line 11, in if args.verbosity >= 2: TypeError: '>=' not supported between instances of 'NoneType' and 'int' * First output went well, and fixes the bug we had before. That is, we want any value >= 2 to be as verbose as possible. * Third output not so good. Let’s fix that bug: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", action="count", default=0, help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity >= 2: print(f"the square of {args.square} equals {answer}") elif args.verbosity >= 1: print(f"{args.square}^2 == {answer}") else: print(answer) We’ve just introduced yet another keyword, "default". We’ve set it to "0" in order to make it comparable to the other int values. Remember that by default, if an optional argument isn’t specified, it gets the "None" value, and that cannot be compared to an int value (hence the "TypeError" exception). And: $ python prog.py 4 16 You can go quite far just with what we’ve learned so far, and we have only scratched the surface. The "argparse" module is very powerful, and we’ll explore a bit more of it before we end this tutorial. Getting a little more advanced ============================== What if we wanted to expand our tiny program to perform other powers, not just squares: import argparse parser = argparse.ArgumentParser() parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") parser.add_argument("-v", "--verbosity", action="count", default=0) args = parser.parse_args() answer = args.x**args.y if args.verbosity >= 2: print(f"{args.x} to the power {args.y} equals {answer}") elif args.verbosity >= 1: print(f"{args.x}^{args.y} == {answer}") else: print(answer) Output: $ python prog.py usage: prog.py [-h] [-v] x y prog.py: error: the following arguments are required: x, y $ python prog.py -h usage: prog.py [-h] [-v] x y positional arguments: x the base y the exponent options: -h, --help show this help message and exit -v, --verbosity $ python prog.py 4 2 -v 4^2 == 16 Notice that so far we’ve been using verbosity level to *change* the text that gets displayed. The following example instead uses verbosity level to display *more* text instead: import argparse parser = argparse.ArgumentParser() parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") parser.add_argument("-v", "--verbosity", action="count", default=0) args = parser.parse_args() answer = args.x**args.y if args.verbosity >= 2: print(f"Running '{__file__}'") if args.verbosity >= 1: print(f"{args.x}^{args.y} == ", end="") print(answer) Output: $ python prog.py 4 2 16 $ python prog.py 4 2 -v 4^2 == 16 $ python prog.py 4 2 -vv Running 'prog.py' 4^2 == 16 Specifying ambiguous arguments ------------------------------ When there is ambiguity in deciding whether an argument is positional or for an argument, "--" can be used to tell "parse_args()" that everything after that is a positional argument: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-n', nargs='+') >>> parser.add_argument('args', nargs='*') >>> # ambiguous, so parse_args assumes it's an option >>> parser.parse_args(['-f']) usage: PROG [-h] [-n N [N ...]] [args ...] PROG: error: unrecognized arguments: -f >>> parser.parse_args(['--', '-f']) Namespace(args=['-f'], n=None) >>> # ambiguous, so the -n option greedily accepts arguments >>> parser.parse_args(['-n', '1', '2', '3']) Namespace(args=[], n=['1', '2', '3']) >>> parser.parse_args(['-n', '1', '--', '2', '3']) Namespace(args=['2', '3'], n=['1']) Conflicting options ------------------- So far, we have been working with two methods of an "argparse.ArgumentParser" instance. Let’s introduce a third one, "add_mutually_exclusive_group()". It allows for us to specify options that conflict with each other. Let’s also change the rest of the program so that the new functionality makes more sense: we’ll introduce the "--quiet" option, which will be the opposite of the "-- verbose" one: import argparse parser = argparse.ArgumentParser() group = parser.add_mutually_exclusive_group() group.add_argument("-v", "--verbose", action="store_true") group.add_argument("-q", "--quiet", action="store_true") parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") args = parser.parse_args() answer = args.x**args.y if args.quiet: print(answer) elif args.verbose: print(f"{args.x} to the power {args.y} equals {answer}") else: print(f"{args.x}^{args.y} == {answer}") Our program is now simpler, and we’ve lost some functionality for the sake of demonstration. Anyways, here’s the output: $ python prog.py 4 2 4^2 == 16 $ python prog.py 4 2 -q 16 $ python prog.py 4 2 -v 4 to the power 2 equals 16 $ python prog.py 4 2 -vq usage: prog.py [-h] [-v | -q] x y prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose $ python prog.py 4 2 -v --quiet usage: prog.py [-h] [-v | -q] x y prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose That should be easy to follow. I’ve added that last output so you can see the sort of flexibility you get, i.e. mixing long form options with short form ones. Before we conclude, you probably want to tell your users the main purpose of your program, just in case they don’t know: import argparse parser = argparse.ArgumentParser(description="calculate X to the power of Y") group = parser.add_mutually_exclusive_group() group.add_argument("-v", "--verbose", action="store_true") group.add_argument("-q", "--quiet", action="store_true") parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") args = parser.parse_args() answer = args.x**args.y if args.quiet: print(answer) elif args.verbose: print(f"{args.x} to the power {args.y} equals {answer}") else: print(f"{args.x}^{args.y} == {answer}") Note that slight difference in the usage text. Note the "[-v | -q]", which tells us that we can either use "-v" or "-q", but not both at the same time: $ python prog.py --help usage: prog.py [-h] [-v | -q] x y calculate X to the power of Y positional arguments: x the base y the exponent options: -h, --help show this help message and exit -v, --verbose -q, --quiet How to translate the argparse output ==================================== The output of the "argparse" module such as its help text and error messages are all made translatable using the "gettext" module. This allows applications to easily localize messages produced by "argparse". See also Internationalizing your programs and modules. For instance, in this "argparse" output: $ python prog.py --help usage: prog.py [-h] [-v | -q] x y calculate X to the power of Y positional arguments: x the base y the exponent options: -h, --help show this help message and exit -v, --verbose -q, --quiet The strings "usage:", "positional arguments:", "options:" and "show this help message and exit" are all translatable. In order to translate these strings, they must first be extracted into a ".po" file. For example, using Babel, run this command: $ pybabel extract -o messages.po /usr/lib/python3.12/argparse.py This command will extract all translatable strings from the "argparse" module and output them into a file named "messages.po". This command assumes that your Python installation is in "/usr/lib". You can find out the location of the "argparse" module on your system using this script: import argparse print(argparse.__file__) Once the messages in the ".po" file are translated and the translations are installed using "gettext", "argparse" will be able to display the translated messages. To translate your own strings in the "argparse" output, use "gettext". Custom type converters ====================== The "argparse" module allows you to specify custom type converters for your command-line arguments. This allows you to modify user input before it’s stored in the "argparse.Namespace". This can be useful when you need to pre-process the input before it is used in your program. When using a custom type converter, you can use any callable that takes a single string argument (the argument value) and returns the converted value. However, if you need to handle more complex scenarios, you can use a custom action class with the **action** parameter instead. For example, let’s say you want to handle arguments with different prefixes and process them accordingly: import argparse parser = argparse.ArgumentParser(prefix_chars='-+') parser.add_argument('-a', metavar='', action='append', type=lambda x: ('-', x)) parser.add_argument('+a', metavar='', action='append', type=lambda x: ('+', x)) args = parser.parse_args() print(args) Output: $ python prog.py -a value1 +a value2 Namespace(a=[('-', 'value1'), ('+', 'value2')]) In this example, we: * Created a parser with custom prefix characters using the "prefix_chars" parameter. * Defined two arguments, "-a" and "+a", which used the "type" parameter to create custom type converters to store the value in a tuple with the prefix. Without the custom type converters, the arguments would have treated the "-a" and "+a" as the same argument, which would have been undesirable. By using custom type converters, we were able to differentiate between the two arguments. Conclusion ========== The "argparse" module offers a lot more than shown here. Its docs are quite detailed and thorough, and full of examples. Having gone through this tutorial, you should easily digest them without feeling overwhelmed. Argument Clinic How-To ********************** Note: The Argument Clinic How-TO has been moved to the Python Developer’s Guide. Porting Extension Modules to Python 3 ************************************* We recommend the following resources for porting extension modules to Python 3: * The Migrating C extensions chapter from *Supporting Python 3: An in- depth guide*, a book on moving from Python 2 to Python 3 in general, guides the reader through porting an extension module. * The Porting guide from the *py3c* project provides opinionated suggestions with supporting code. * Recommended third party tools offer abstractions over the Python’s C API. Extensions generally need to be re-written to use one of them, but the library then handles differences between various Python versions and implementations. Curses Programming with Python ****************************** Author: A.M. Kuchling, Eric S. Raymond Release: 2.04 Abstract ^^^^^^^^ This document describes how to use the "curses" extension module to control text-mode displays. What is curses? =============== The curses library supplies a terminal-independent screen-painting and keyboard-handling facility for text-based terminals; such terminals include VT100s, the Linux console, and the simulated terminal provided by various programs. Display terminals support various control codes to perform common operations such as moving the cursor, scrolling the screen, and erasing areas. Different terminals use widely differing codes, and often have their own minor quirks. In a world of graphical displays, one might ask “why bother”? It’s true that character-cell display terminals are an obsolete technology, but there are niches in which being able to do fancy things with them are still valuable. One niche is on small-footprint or embedded Unixes that don’t run an X server. Another is tools such as OS installers and kernel configurators that may have to run before any graphical support is available. The curses library provides fairly basic functionality, providing the programmer with an abstraction of a display containing multiple non- overlapping windows of text. The contents of a window can be changed in various ways—adding text, erasing it, changing its appearance—and the curses library will figure out what control codes need to be sent to the terminal to produce the right output. curses doesn’t provide many user-interface concepts such as buttons, checkboxes, or dialogs; if you need such features, consider a user interface library such as Urwid. The curses library was originally written for BSD Unix; the later System V versions of Unix from AT&T added many enhancements and new functions. BSD curses is no longer maintained, having been replaced by ncurses, which is an open-source implementation of the AT&T interface. If you’re using an open-source Unix such as Linux or FreeBSD, your system almost certainly uses ncurses. Since most current commercial Unix versions are based on System V code, all the functions described here will probably be available. The older versions of curses carried by some proprietary Unixes may not support everything, though. The Windows version of Python doesn’t include the "curses" module. A ported version called UniCurses is available. The Python curses module ------------------------ The Python module is a fairly simple wrapper over the C functions provided by curses; if you’re already familiar with curses programming in C, it’s really easy to transfer that knowledge to Python. The biggest difference is that the Python interface makes things simpler by merging different C functions such as "addstr()", "mvaddstr()", and "mvwaddstr()" into a single "addstr()" method. You’ll see this covered in more detail later. This HOWTO is an introduction to writing text-mode programs with curses and Python. It doesn’t attempt to be a complete guide to the curses API; for that, see the Python library guide’s section on ncurses, and the C manual pages for ncurses. It will, however, give you the basic ideas. Starting and ending a curses application ======================================== Before doing anything, curses must be initialized. This is done by calling the "initscr()" function, which will determine the terminal type, send any required setup codes to the terminal, and create various internal data structures. If successful, "initscr()" returns a window object representing the entire screen; this is usually called "stdscr" after the name of the corresponding C variable. import curses stdscr = curses.initscr() Usually curses applications turn off automatic echoing of keys to the screen, in order to be able to read keys and only display them under certain circumstances. This requires calling the "noecho()" function. curses.noecho() Applications will also commonly need to react to keys instantly, without requiring the Enter key to be pressed; this is called cbreak mode, as opposed to the usual buffered input mode. curses.cbreak() Terminals usually return special keys, such as the cursor keys or navigation keys such as Page Up and Home, as a multibyte escape sequence. While you could write your application to expect such sequences and process them accordingly, curses can do it for you, returning a special value such as "curses.KEY_LEFT". To get curses to do the job, you’ll have to enable keypad mode. stdscr.keypad(True) Terminating a curses application is much easier than starting one. You’ll need to call: curses.nocbreak() stdscr.keypad(False) curses.echo() to reverse the curses-friendly terminal settings. Then call the "endwin()" function to restore the terminal to its original operating mode. curses.endwin() A common problem when debugging a curses application is to get your terminal messed up when the application dies without restoring the terminal to its previous state. In Python this commonly happens when your code is buggy and raises an uncaught exception. Keys are no longer echoed to the screen when you type them, for example, which makes using the shell difficult. In Python you can avoid these complications and make debugging much easier by importing the "curses.wrapper()" function and using it like this: from curses import wrapper def main(stdscr): # Clear screen stdscr.clear() # This raises ZeroDivisionError when i == 10. for i in range(0, 11): v = i-10 stdscr.addstr(i, 0, '10 divided by {} is {}'.format(v, 10/v)) stdscr.refresh() stdscr.getkey() wrapper(main) The "wrapper()" function takes a callable object and does the initializations described above, also initializing colors if color support is present. "wrapper()" then runs your provided callable. Once the callable returns, "wrapper()" will restore the original state of the terminal. The callable is called inside a "try"…"except" that catches exceptions, restores the state of the terminal, and then re- raises the exception. Therefore your terminal won’t be left in a funny state on exception and you’ll be able to read the exception’s message and traceback. Windows and Pads ================ Windows are the basic abstraction in curses. A window object represents a rectangular area of the screen, and supports methods to display text, erase it, allow the user to input strings, and so forth. The "stdscr" object returned by the "initscr()" function is a window object that covers the entire screen. Many programs may need only this single window, but you might wish to divide the screen into smaller windows, in order to redraw or clear them separately. The "newwin()" function creates a new window of a given size, returning the new window object. begin_x = 20; begin_y = 7 height = 5; width = 40 win = curses.newwin(height, width, begin_y, begin_x) Note that the coordinate system used in curses is unusual. Coordinates are always passed in the order *y,x*, and the top-left corner of a window is coordinate (0,0). This breaks the normal convention for handling coordinates where the *x* coordinate comes first. This is an unfortunate difference from most other computer applications, but it’s been part of curses since it was first written, and it’s too late to change things now. Your application can determine the size of the screen by using the "curses.LINES" and "curses.COLS" variables to obtain the *y* and *x* sizes. Legal coordinates will then extend from "(0,0)" to "(curses.LINES - 1, curses.COLS - 1)". When you call a method to display or erase text, the effect doesn’t immediately show up on the display. Instead you must call the "refresh()" method of window objects to update the screen. This is because curses was originally written with slow 300-baud terminal connections in mind; with these terminals, minimizing the time required to redraw the screen was very important. Instead curses accumulates changes to the screen and displays them in the most efficient manner when you call "refresh()". For example, if your program displays some text in a window and then clears the window, there’s no need to send the original text because they’re never visible. In practice, explicitly telling curses to redraw a window doesn’t really complicate programming with curses much. Most programs go into a flurry of activity, and then pause waiting for a keypress or some other action on the part of the user. All you have to do is to be sure that the screen has been redrawn before pausing to wait for user input, by first calling "stdscr.refresh()" or the "refresh()" method of some other relevant window. A pad is a special case of a window; it can be larger than the actual display screen, and only a portion of the pad displayed at a time. Creating a pad requires the pad’s height and width, while refreshing a pad requires giving the coordinates of the on-screen area where a subsection of the pad will be displayed. pad = curses.newpad(100, 100) # These loops fill the pad with letters; addch() is # explained in the next section for y in range(0, 99): for x in range(0, 99): pad.addch(y,x, ord('a') + (x*x+y*y) % 26) # Displays a section of the pad in the middle of the screen. # (0,0) : coordinate of upper-left corner of pad area to display. # (5,5) : coordinate of upper-left corner of window area to be filled # with pad content. # (20, 75) : coordinate of lower-right corner of window area to be # : filled with pad content. pad.refresh( 0,0, 5,5, 20,75) The "refresh()" call displays a section of the pad in the rectangle extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper left corner of the displayed section is coordinate (0,0) on the pad. Beyond that difference, pads are exactly like ordinary windows and support the same methods. If you have multiple windows and pads on screen there is a more efficient way to update the screen and prevent annoying screen flicker as each part of the screen gets updated. "refresh()" actually does two things: 1. Calls the "noutrefresh()" method of each window to update an underlying data structure representing the desired state of the screen. 2. Calls the function "doupdate()" function to change the physical screen to match the desired state recorded in the data structure. Instead you can call "noutrefresh()" on a number of windows to update the data structure, and then call "doupdate()" to update the screen. Displaying Text =============== From a C programmer’s point of view, curses may sometimes look like a twisty maze of functions, all subtly different. For example, "addstr()" displays a string at the current cursor location in the "stdscr" window, while "mvaddstr()" moves to a given y,x coordinate first before displaying the string. "waddstr()" is just like "addstr()", but allows specifying a window to use instead of using "stdscr" by default. "mvwaddstr()" allows specifying both a window and a coordinate. Fortunately the Python interface hides all these details. "stdscr" is a window object like any other, and methods such as "addstr()" accept multiple argument forms. Usually there are four different forms. +-----------------------------------+-------------------------------------------------+ | Form | Description | |===================================|=================================================| | *str* or *ch* | Display the string *str* or character *ch* at | | | the current position | +-----------------------------------+-------------------------------------------------+ | *str* or *ch*, *attr* | Display the string *str* or character *ch*, | | | using attribute *attr* at the current position | +-----------------------------------+-------------------------------------------------+ | *y*, *x*, *str* or *ch* | Move to position *y,x* within the window, and | | | display *str* or *ch* | +-----------------------------------+-------------------------------------------------+ | *y*, *x*, *str* or *ch*, *attr* | Move to position *y,x* within the window, and | | | display *str* or *ch*, using attribute *attr* | +-----------------------------------+-------------------------------------------------+ Attributes allow displaying text in highlighted forms such as boldface, underline, reverse code, or in color. They’ll be explained in more detail in the next subsection. The "addstr()" method takes a Python string or bytestring as the value to be displayed. The contents of bytestrings are sent to the terminal as-is. Strings are encoded to bytes using the value of the window’s "encoding" attribute; this defaults to the default system encoding as returned by "locale.getencoding()". The "addch()" methods take a character, which can be either a string of length 1, a bytestring of length 1, or an integer. Constants are provided for extension characters; these constants are integers greater than 255. For example, "ACS_PLMINUS" is a +/- symbol, and "ACS_ULCORNER" is the upper left corner of a box (handy for drawing borders). You can also use the appropriate Unicode character. Windows remember where the cursor was left after the last operation, so if you leave out the *y,x* coordinates, the string or character will be displayed wherever the last operation left off. You can also move the cursor with the "move(y,x)" method. Because some terminals always display a flashing cursor, you may want to ensure that the cursor is positioned in some location where it won’t be distracting; it can be confusing to have the cursor blinking at some apparently random location. If your application doesn’t need a blinking cursor at all, you can call "curs_set(False)" to make it invisible. For compatibility with older curses versions, there’s a "leaveok(bool)" function that’s a synonym for "curs_set()". When *bool* is true, the curses library will attempt to suppress the flashing cursor, and you won’t need to worry about leaving it in odd locations. Attributes and Color -------------------- Characters can be displayed in different ways. Status lines in a text-based application are commonly shown in reverse video, or a text viewer may need to highlight certain words. curses supports this by allowing you to specify an attribute for each cell on the screen. An attribute is an integer, each bit representing a different attribute. You can try to display text with multiple attribute bits set, but curses doesn’t guarantee that all the possible combinations are available, or that they’re all visually distinct. That depends on the ability of the terminal being used, so it’s safest to stick to the most commonly available attributes, listed here. +------------------------+----------------------------------------+ | Attribute | Description | |========================|========================================| | "A_BLINK" | Blinking text | +------------------------+----------------------------------------+ | "A_BOLD" | Extra bright or bold text | +------------------------+----------------------------------------+ | "A_DIM" | Half bright text | +------------------------+----------------------------------------+ | "A_REVERSE" | Reverse-video text | +------------------------+----------------------------------------+ | "A_STANDOUT" | The best highlighting mode available | +------------------------+----------------------------------------+ | "A_UNDERLINE" | Underlined text | +------------------------+----------------------------------------+ So, to display a reverse-video status line on the top line of the screen, you could code: stdscr.addstr(0, 0, "Current mode: Typing mode", curses.A_REVERSE) stdscr.refresh() The curses library also supports color on those terminals that provide it. The most common such terminal is probably the Linux console, followed by color xterms. To use color, you must call the "start_color()" function soon after calling "initscr()", to initialize the default color set (the "curses.wrapper()" function does this automatically). Once that’s done, the "has_colors()" function returns TRUE if the terminal in use can actually display color. (Note: curses uses the American spelling ‘color’, instead of the Canadian/British spelling ‘colour’. If you’re used to the British spelling, you’ll have to resign yourself to misspelling it for the sake of these functions.) The curses library maintains a finite number of color pairs, containing a foreground (or text) color and a background color. You can get the attribute value corresponding to a color pair with the "color_pair()" function; this can be bitwise-OR’ed with other attributes such as "A_REVERSE", but again, such combinations are not guaranteed to work on all terminals. An example, which displays a line of text using color pair 1: stdscr.addstr("Pretty text", curses.color_pair(1)) stdscr.refresh() As I said before, a color pair consists of a foreground and background color. The "init_pair(n, f, b)" function changes the definition of color pair *n*, to foreground color f and background color b. Color pair 0 is hard-wired to white on black, and cannot be changed. Colors are numbered, and "start_color()" initializes 8 basic colors when it activates color mode. They are: 0:black, 1:red, 2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and 7:white. The "curses" module defines named constants for each of these colors: "curses.COLOR_BLACK", "curses.COLOR_RED", and so forth. Let’s put all this together. To change color 1 to red text on a white background, you would call: curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE) When you change a color pair, any text already displayed using that color pair will change to the new colors. You can also display new text in this color with: stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1)) Very fancy terminals can change the definitions of the actual colors to a given RGB value. This lets you change color 1, which is usually red, to purple or blue or any other color you like. Unfortunately, the Linux console doesn’t support this, so I’m unable to try it out, and can’t provide any examples. You can check if your terminal can do this by calling "can_change_color()", which returns "True" if the capability is there. If you’re lucky enough to have such a talented terminal, consult your system’s man pages for more information. User Input ========== The C curses library offers only very simple input mechanisms. Python’s "curses" module adds a basic text-input widget. (Other libraries such as Urwid have more extensive collections of widgets.) There are two methods for getting input from a window: * "getch()" refreshes the screen and then waits for the user to hit a key, displaying the key if "echo()" has been called earlier. You can optionally specify a coordinate to which the cursor should be moved before pausing. * "getkey()" does the same thing but converts the integer to a string. Individual characters are returned as 1-character strings, and special keys such as function keys return longer strings containing a key name such as "KEY_UP" or "^G". It’s possible to not wait for the user using the "nodelay()" window method. After "nodelay(True)", "getch()" and "getkey()" for the window become non-blocking. To signal that no input is ready, "getch()" returns "curses.ERR" (a value of -1) and "getkey()" raises an exception. There’s also a "halfdelay()" function, which can be used to (in effect) set a timer on each "getch()"; if no input becomes available within a specified delay (measured in tenths of a second), curses raises an exception. The "getch()" method returns an integer; if it’s between 0 and 255, it represents the ASCII code of the key pressed. Values greater than 255 are special keys such as Page Up, Home, or the cursor keys. You can compare the value returned to constants such as "curses.KEY_PPAGE", "curses.KEY_HOME", or "curses.KEY_LEFT". The main loop of your program may look something like this: while True: c = stdscr.getch() if c == ord('p'): PrintDocument() elif c == ord('q'): break # Exit the while loop elif c == curses.KEY_HOME: x = y = 0 The "curses.ascii" module supplies ASCII class membership functions that take either integer or 1-character string arguments; these may be useful in writing more readable tests for such loops. It also supplies conversion functions that take either integer or 1 -character-string arguments and return the same type. For example, "curses.ascii.ctrl()" returns the control character corresponding to its argument. There’s also a method to retrieve an entire string, "getstr()". It isn’t used very often, because its functionality is quite limited; the only editing keys available are the backspace key and the Enter key, which terminates the string. It can optionally be limited to a fixed number of characters. curses.echo() # Enable echoing of characters # Get a 15-character string, with the cursor on the top line s = stdscr.getstr(0,0, 15) The "curses.textpad" module supplies a text box that supports an Emacs-like set of keybindings. Various methods of the "Textbox" class support editing with input validation and gathering the edit results either with or without trailing spaces. Here’s an example: import curses from curses.textpad import Textbox, rectangle def main(stdscr): stdscr.addstr(0, 0, "Enter IM message: (hit Ctrl-G to send)") editwin = curses.newwin(5,30, 2,1) rectangle(stdscr, 1,0, 1+5+1, 1+30+1) stdscr.refresh() box = Textbox(editwin) # Let the user edit until Ctrl-G is struck. box.edit() # Get resulting contents message = box.gather() See the library documentation on "curses.textpad" for more details. For More Information ==================== This HOWTO doesn’t cover some advanced topics, such as reading the contents of the screen or capturing mouse events from an xterm instance, but the Python library page for the "curses" module is now reasonably complete. You should browse it next. If you’re in doubt about the detailed behavior of the curses functions, consult the manual pages for your curses implementation, whether it’s ncurses or a proprietary Unix vendor’s. The manual pages will document any quirks, and provide complete lists of all the functions, attributes, and ACS_* characters available to you. Because the curses API is so large, some functions aren’t supported in the Python interface. Often this isn’t because they’re difficult to implement, but because no one has needed them yet. Also, Python doesn’t yet support the menu library associated with ncurses. Patches adding support for these would be welcome; see the Python Developer’s Guide to learn more about submitting patches to Python. * Writing Programs with NCURSES: a lengthy tutorial for C programmers. * The ncurses man page * The ncurses FAQ * “Use curses… don’t swear”: video of a PyCon 2013 talk on controlling terminals using curses or Urwid. * “Console Applications with Urwid”: video of a PyCon CA 2012 talk demonstrating some applications written using Urwid. Descriptor Guide **************** Author: Raymond Hettinger Contact: Contents ^^^^^^^^ * Descriptor Guide * Primer * Simple example: A descriptor that returns a constant * Dynamic lookups * Managed attributes * Customized names * Closing thoughts * Complete Practical Example * Validator class * Custom validators * Practical application * Technical Tutorial * Abstract * Definition and introduction * Descriptor protocol * Overview of descriptor invocation * Invocation from an instance * Invocation from a class * Invocation from super * Summary of invocation logic * Automatic name notification * ORM example * Pure Python Equivalents * Properties * Functions and methods * Kinds of methods * Static methods * Class methods * Member objects and __slots__ *Descriptors* let objects customize attribute lookup, storage, and deletion. This guide has four major sections: 1. The “primer” gives a basic overview, moving gently from simple examples, adding one feature at a time. Start here if you’re new to descriptors. 2. The second section shows a complete, practical descriptor example. If you already know the basics, start there. 3. The third section provides a more technical tutorial that goes into the detailed mechanics of how descriptors work. Most people don’t need this level of detail. 4. The last section has pure Python equivalents for built-in descriptors that are written in C. Read this if you’re curious about how functions turn into bound methods or about the implementation of common tools like "classmethod()", "staticmethod()", "property()", and *__slots__*. Primer ====== In this primer, we start with the most basic possible example and then we’ll add new capabilities one by one. Simple example: A descriptor that returns a constant ---------------------------------------------------- The "Ten" class is a descriptor whose "__get__()" method always returns the constant "10": class Ten: def __get__(self, obj, objtype=None): return 10 To use the descriptor, it must be stored as a class variable in another class: class A: x = 5 # Regular class attribute y = Ten() # Descriptor instance An interactive session shows the difference between normal attribute lookup and descriptor lookup: >>> a = A() # Make an instance of class A >>> a.x # Normal attribute lookup 5 >>> a.y # Descriptor lookup 10 In the "a.x" attribute lookup, the dot operator finds "'x': 5" in the class dictionary. In the "a.y" lookup, the dot operator finds a descriptor instance, recognized by its "__get__" method. Calling that method returns "10". Note that the value "10" is not stored in either the class dictionary or the instance dictionary. Instead, the value "10" is computed on demand. This example shows how a simple descriptor works, but it isn’t very useful. For retrieving constants, normal attribute lookup would be better. In the next section, we’ll create something more useful, a dynamic lookup. Dynamic lookups --------------- Interesting descriptors typically run computations instead of returning constants: import os class DirectorySize: def __get__(self, obj, objtype=None): return len(os.listdir(obj.dirname)) class Directory: size = DirectorySize() # Descriptor instance def __init__(self, dirname): self.dirname = dirname # Regular instance attribute An interactive session shows that the lookup is dynamic — it computes different, updated answers each time: >>> s = Directory('songs') >>> g = Directory('games') >>> s.size # The songs directory has twenty files 20 >>> g.size # The games directory has three files 3 >>> os.remove('games/chess') # Delete a game >>> g.size # File count is automatically updated 2 Besides showing how descriptors can run computations, this example also reveals the purpose of the parameters to "__get__()". The *self* parameter is *size*, an instance of *DirectorySize*. The *obj* parameter is either *g* or *s*, an instance of *Directory*. It is the *obj* parameter that lets the "__get__()" method learn the target directory. The *objtype* parameter is the class *Directory*. Managed attributes ------------------ A popular use for descriptors is managing access to instance data. The descriptor is assigned to a public attribute in the class dictionary while the actual data is stored as a private attribute in the instance dictionary. The descriptor’s "__get__()" and "__set__()" methods are triggered when the public attribute is accessed. In the following example, *age* is the public attribute and *_age* is the private attribute. When the public attribute is accessed, the descriptor logs the lookup or update: import logging logging.basicConfig(level=logging.INFO) class LoggedAgeAccess: def __get__(self, obj, objtype=None): value = obj._age logging.info('Accessing %r giving %r', 'age', value) return value def __set__(self, obj, value): logging.info('Updating %r to %r', 'age', value) obj._age = value class Person: age = LoggedAgeAccess() # Descriptor instance def __init__(self, name, age): self.name = name # Regular instance attribute self.age = age # Calls __set__() def birthday(self): self.age += 1 # Calls both __get__() and __set__() An interactive session shows that all access to the managed attribute *age* is logged, but that the regular attribute *name* is not logged: >>> mary = Person('Mary M', 30) # The initial age update is logged INFO:root:Updating 'age' to 30 >>> dave = Person('David D', 40) INFO:root:Updating 'age' to 40 >>> vars(mary) # The actual data is in a private attribute {'name': 'Mary M', '_age': 30} >>> vars(dave) {'name': 'David D', '_age': 40} >>> mary.age # Access the data and log the lookup INFO:root:Accessing 'age' giving 30 30 >>> mary.birthday() # Updates are logged as well INFO:root:Accessing 'age' giving 30 INFO:root:Updating 'age' to 31 >>> dave.name # Regular attribute lookup isn't logged 'David D' >>> dave.age # Only the managed attribute is logged INFO:root:Accessing 'age' giving 40 40 One major issue with this example is that the private name *_age* is hardwired in the *LoggedAgeAccess* class. That means that each instance can only have one logged attribute and that its name is unchangeable. In the next example, we’ll fix that problem. Customized names ---------------- When a class uses descriptors, it can inform each descriptor about which variable name was used. In this example, the "Person" class has two descriptor instances, *name* and *age*. When the "Person" class is defined, it makes a callback to "__set_name__()" in *LoggedAccess* so that the field names can be recorded, giving each descriptor its own *public_name* and *private_name*: import logging logging.basicConfig(level=logging.INFO) class LoggedAccess: def __set_name__(self, owner, name): self.public_name = name self.private_name = '_' + name def __get__(self, obj, objtype=None): value = getattr(obj, self.private_name) logging.info('Accessing %r giving %r', self.public_name, value) return value def __set__(self, obj, value): logging.info('Updating %r to %r', self.public_name, value) setattr(obj, self.private_name, value) class Person: name = LoggedAccess() # First descriptor instance age = LoggedAccess() # Second descriptor instance def __init__(self, name, age): self.name = name # Calls the first descriptor self.age = age # Calls the second descriptor def birthday(self): self.age += 1 An interactive session shows that the "Person" class has called "__set_name__()" so that the field names would be recorded. Here we call "vars()" to look up the descriptor without triggering it: >>> vars(vars(Person)['name']) {'public_name': 'name', 'private_name': '_name'} >>> vars(vars(Person)['age']) {'public_name': 'age', 'private_name': '_age'} The new class now logs access to both *name* and *age*: >>> pete = Person('Peter P', 10) INFO:root:Updating 'name' to 'Peter P' INFO:root:Updating 'age' to 10 >>> kate = Person('Catherine C', 20) INFO:root:Updating 'name' to 'Catherine C' INFO:root:Updating 'age' to 20 The two *Person* instances contain only the private names: >>> vars(pete) {'_name': 'Peter P', '_age': 10} >>> vars(kate) {'_name': 'Catherine C', '_age': 20} Closing thoughts ---------------- A *descriptor* is what we call any object that defines "__get__()", "__set__()", or "__delete__()". Optionally, descriptors can have a "__set_name__()" method. This is only used in cases where a descriptor needs to know either the class where it was created or the name of class variable it was assigned to. (This method, if present, is called even if the class is not a descriptor.) Descriptors get invoked by the dot operator during attribute lookup. If a descriptor is accessed indirectly with "vars(some_class)[descriptor_name]", the descriptor instance is returned without invoking it. Descriptors only work when used as class variables. When put in instances, they have no effect. The main motivation for descriptors is to provide a hook allowing objects stored in class variables to control what happens during attribute lookup. Traditionally, the calling class controls what happens during lookup. Descriptors invert that relationship and allow the data being looked- up to have a say in the matter. Descriptors are used throughout the language. It is how functions turn into bound methods. Common tools like "classmethod()", "staticmethod()", "property()", and "functools.cached_property()" are all implemented as descriptors. Complete Practical Example ========================== In this example, we create a practical and powerful tool for locating notoriously hard to find data corruption bugs. Validator class --------------- A validator is a descriptor for managed attribute access. Prior to storing any data, it verifies that the new value meets various type and range restrictions. If those restrictions aren’t met, it raises an exception to prevent data corruption at its source. This "Validator" class is both an *abstract base class* and a managed attribute descriptor: from abc import ABC, abstractmethod class Validator(ABC): def __set_name__(self, owner, name): self.private_name = '_' + name def __get__(self, obj, objtype=None): return getattr(obj, self.private_name) def __set__(self, obj, value): self.validate(value) setattr(obj, self.private_name, value) @abstractmethod def validate(self, value): pass Custom validators need to inherit from "Validator" and must supply a "validate()" method to test various restrictions as needed. Custom validators ----------------- Here are three practical data validation utilities: 1. "OneOf" verifies that a value is one of a restricted set of options. 2. "Number" verifies that a value is either an "int" or "float". Optionally, it verifies that a value is between a given minimum or maximum. 3. "String" verifies that a value is a "str". Optionally, it validates a given minimum or maximum length. It can validate a user-defined predicate as well. class OneOf(Validator): def __init__(self, *options): self.options = set(options) def validate(self, value): if value not in self.options: raise ValueError( f'Expected {value!r} to be one of {self.options!r}' ) class Number(Validator): def __init__(self, minvalue=None, maxvalue=None): self.minvalue = minvalue self.maxvalue = maxvalue def validate(self, value): if not isinstance(value, (int, float)): raise TypeError(f'Expected {value!r} to be an int or float') if self.minvalue is not None and value < self.minvalue: raise ValueError( f'Expected {value!r} to be at least {self.minvalue!r}' ) if self.maxvalue is not None and value > self.maxvalue: raise ValueError( f'Expected {value!r} to be no more than {self.maxvalue!r}' ) class String(Validator): def __init__(self, minsize=None, maxsize=None, predicate=None): self.minsize = minsize self.maxsize = maxsize self.predicate = predicate def validate(self, value): if not isinstance(value, str): raise TypeError(f'Expected {value!r} to be an str') if self.minsize is not None and len(value) < self.minsize: raise ValueError( f'Expected {value!r} to be no smaller than {self.minsize!r}' ) if self.maxsize is not None and len(value) > self.maxsize: raise ValueError( f'Expected {value!r} to be no bigger than {self.maxsize!r}' ) if self.predicate is not None and not self.predicate(value): raise ValueError( f'Expected {self.predicate} to be true for {value!r}' ) Practical application --------------------- Here’s how the data validators can be used in a real class: class Component: name = String(minsize=3, maxsize=10, predicate=str.isupper) kind = OneOf('wood', 'metal', 'plastic') quantity = Number(minvalue=0) def __init__(self, name, kind, quantity): self.name = name self.kind = kind self.quantity = quantity The descriptors prevent invalid instances from being created: >>> Component('Widget', 'metal', 5) # Blocked: 'Widget' is not all uppercase Traceback (most recent call last): ... ValueError: Expected to be true for 'Widget' >>> Component('WIDGET', 'metle', 5) # Blocked: 'metle' is misspelled Traceback (most recent call last): ... ValueError: Expected 'metle' to be one of {'metal', 'plastic', 'wood'} >>> Component('WIDGET', 'metal', -5) # Blocked: -5 is negative Traceback (most recent call last): ... ValueError: Expected -5 to be at least 0 >>> Component('WIDGET', 'metal', 'V') # Blocked: 'V' isn't a number Traceback (most recent call last): ... TypeError: Expected 'V' to be an int or float >>> c = Component('WIDGET', 'metal', 5) # Allowed: The inputs are valid Technical Tutorial ================== What follows is a more technical tutorial for the mechanics and details of how descriptors work. Abstract -------- Defines descriptors, summarizes the protocol, and shows how descriptors are called. Provides an example showing how object relational mappings work. Learning about descriptors not only provides access to a larger toolset, it creates a deeper understanding of how Python works. Definition and introduction --------------------------- In general, a descriptor is an attribute value that has one of the methods in the descriptor protocol. Those methods are "__get__()", "__set__()", and "__delete__()". If any of those methods are defined for an attribute, it is said to be a *descriptor*. The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary. For instance, "a.x" has a lookup chain starting with "a.__dict__['x']", then "type(a).__dict__['x']", and continuing through the method resolution order of "type(a)". If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined. Descriptors are a powerful, general purpose protocol. They are the mechanism behind properties, methods, static methods, class methods, and "super()". They are used throughout Python itself. Descriptors simplify the underlying C code and offer a flexible set of new tools for everyday Python programs. Descriptor protocol ------------------- "descr.__get__(self, obj, type=None)" "descr.__set__(self, obj, value)" "descr.__delete__(self, obj)" That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute. If an object defines "__set__()" or "__delete__()", it is considered a data descriptor. Descriptors that only define "__get__()" are called non-data descriptors (they are often used for methods but other uses are possible). Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence. To make a read-only data descriptor, define both "__get__()" and "__set__()" with the "__set__()" raising an "AttributeError" when called. Defining the "__set__()" method with an exception raising placeholder is enough to make it a data descriptor. Overview of descriptor invocation --------------------------------- A descriptor can be called directly with "desc.__get__(obj)" or "desc.__get__(None, cls)". But it is more common for a descriptor to be invoked automatically from attribute access. The expression "obj.x" looks up the attribute "x" in the chain of namespaces for "obj". If the search finds a descriptor outside of the instance "__dict__", its "__get__()" method is invoked according to the precedence rules listed below. The details of invocation depend on whether "obj" is an object, class, or instance of super. Invocation from an instance --------------------------- Instance lookup scans through a chain of namespaces giving data descriptors the highest priority, followed by instance variables, then non-data descriptors, then class variables, and lastly "__getattr__()" if it is provided. If a descriptor is found for "a.x", then it is invoked with: "desc.__get__(a, type(a))". The logic for a dotted lookup is in "object.__getattribute__()". Here is a pure Python equivalent: def find_name_in_mro(cls, name, default): "Emulate _PyType_Lookup() in Objects/typeobject.c" for base in cls.__mro__: if name in vars(base): return vars(base)[name] return default def object_getattribute(obj, name): "Emulate PyObject_GenericGetAttr() in Objects/object.c" null = object() objtype = type(obj) cls_var = find_name_in_mro(objtype, name, null) descr_get = getattr(type(cls_var), '__get__', null) if descr_get is not null: if (hasattr(type(cls_var), '__set__') or hasattr(type(cls_var), '__delete__')): return descr_get(cls_var, obj, objtype) # data descriptor if hasattr(obj, '__dict__') and name in vars(obj): return vars(obj)[name] # instance variable if descr_get is not null: return descr_get(cls_var, obj, objtype) # non-data descriptor if cls_var is not null: return cls_var # class variable raise AttributeError(name) Note, there is no "__getattr__()" hook in the "__getattribute__()" code. That is why calling "__getattribute__()" directly or with "super().__getattribute__" will bypass "__getattr__()" entirely. Instead, it is the dot operator and the "getattr()" function that are responsible for invoking "__getattr__()" whenever "__getattribute__()" raises an "AttributeError". Their logic is encapsulated in a helper function: def getattr_hook(obj, name): "Emulate slot_tp_getattr_hook() in Objects/typeobject.c" try: return obj.__getattribute__(name) except AttributeError: if not hasattr(type(obj), '__getattr__'): raise return type(obj).__getattr__(obj, name) # __getattr__ Invocation from a class ----------------------- The logic for a dotted lookup such as "A.x" is in "type.__getattribute__()". The steps are similar to those for "object.__getattribute__()" but the instance dictionary lookup is replaced by a search through the class’s *method resolution order*. If a descriptor is found, it is invoked with "desc.__get__(None, A)". The full C implementation can be found in "type_getattro()" and "_PyType_Lookup()" in Objects/typeobject.c. Invocation from super --------------------- The logic for super’s dotted lookup is in the "__getattribute__()" method for object returned by "super()". A dotted lookup such as "super(A, obj).m" searches "obj.__class__.__mro__" for the base class "B" immediately following "A" and then returns "B.__dict__['m'].__get__(obj, A)". If not a descriptor, "m" is returned unchanged. The full C implementation can be found in "super_getattro()" in Objects/typeobject.c. A pure Python equivalent can be found in Guido’s Tutorial. Summary of invocation logic --------------------------- The mechanism for descriptors is embedded in the "__getattribute__()" methods for "object", "type", and "super()". The important points to remember are: * Descriptors are invoked by the "__getattribute__()" method. * Classes inherit this machinery from "object", "type", or "super()". * Overriding "__getattribute__()" prevents automatic descriptor calls because all the descriptor logic is in that method. * "object.__getattribute__()" and "type.__getattribute__()" make different calls to "__get__()". The first includes the instance and may include the class. The second puts in "None" for the instance and always includes the class. * Data descriptors always override instance dictionaries. * Non-data descriptors may be overridden by instance dictionaries. Automatic name notification --------------------------- Sometimes it is desirable for a descriptor to know what class variable name it was assigned to. When a new class is created, the "type" metaclass scans the dictionary of the new class. If any of the entries are descriptors and if they define "__set_name__()", that method is called with two arguments. The *owner* is the class where the descriptor is used, and the *name* is the class variable the descriptor was assigned to. The implementation details are in "type_new()" and "set_names()" in Objects/typeobject.c. Since the update logic is in "type.__new__()", notifications only take place at the time of class creation. If descriptors are added to the class afterwards, "__set_name__()" will need to be called manually. ORM example ----------- The following code is a simplified skeleton showing how data descriptors could be used to implement an object relational mapping. The essential idea is that the data is stored in an external database. The Python instances only hold keys to the database’s tables. Descriptors take care of lookups or updates: class Field: def __set_name__(self, owner, name): self.fetch = f'SELECT {name} FROM {owner.table} WHERE {owner.key}=?;' self.store = f'UPDATE {owner.table} SET {name}=? WHERE {owner.key}=?;' def __get__(self, obj, objtype=None): return conn.execute(self.fetch, [obj.key]).fetchone()[0] def __set__(self, obj, value): conn.execute(self.store, [value, obj.key]) conn.commit() We can use the "Field" class to define models that describe the schema for each table in a database: class Movie: table = 'Movies' # Table name key = 'title' # Primary key director = Field() year = Field() def __init__(self, key): self.key = key class Song: table = 'Music' key = 'title' artist = Field() year = Field() genre = Field() def __init__(self, key): self.key = key To use the models, first connect to the database: >>> import sqlite3 >>> conn = sqlite3.connect('entertainment.db') An interactive session shows how data is retrieved from the database and how it can be updated: >>> Movie('Star Wars').director 'George Lucas' >>> jaws = Movie('Jaws') >>> f'Released in {jaws.year} by {jaws.director}' 'Released in 1975 by Steven Spielberg' >>> Song('Country Roads').artist 'John Denver' >>> Movie('Star Wars').director = 'J.J. Abrams' >>> Movie('Star Wars').director 'J.J. Abrams' Pure Python Equivalents ======================= The descriptor protocol is simple and offers exciting possibilities. Several use cases are so common that they have been prepackaged into built-in tools. Properties, bound methods, static methods, class methods, and __slots__ are all based on the descriptor protocol. Properties ---------- Calling "property()" is a succinct way of building a data descriptor that triggers a function call upon access to an attribute. Its signature is: property(fget=None, fset=None, fdel=None, doc=None) -> property The documentation shows a typical use to define a managed attribute "x": class C: def getx(self): return self.__x def setx(self, value): self.__x = value def delx(self): del self.__x x = property(getx, setx, delx, "I'm the 'x' property.") To see how "property()" is implemented in terms of the descriptor protocol, here is a pure Python equivalent that implements most of the core functionality: class Property: "Emulate PyProperty_Type() in Objects/descrobject.c" def __init__(self, fget=None, fset=None, fdel=None, doc=None): self.fget = fget self.fset = fset self.fdel = fdel if doc is None and fget is not None: doc = fget.__doc__ self.__doc__ = doc def __set_name__(self, owner, name): self.__name__ = name def __get__(self, obj, objtype=None): if obj is None: return self if self.fget is None: raise AttributeError return self.fget(obj) def __set__(self, obj, value): if self.fset is None: raise AttributeError self.fset(obj, value) def __delete__(self, obj): if self.fdel is None: raise AttributeError self.fdel(obj) def getter(self, fget): return type(self)(fget, self.fset, self.fdel, self.__doc__) def setter(self, fset): return type(self)(self.fget, fset, self.fdel, self.__doc__) def deleter(self, fdel): return type(self)(self.fget, self.fset, fdel, self.__doc__) The "property()" builtin helps whenever a user interface has granted attribute access and then subsequent changes require the intervention of a method. For instance, a spreadsheet class may grant access to a cell value through "Cell('b10').value". Subsequent improvements to the program require the cell to be recalculated on every access; however, the programmer does not want to affect existing client code accessing the attribute directly. The solution is to wrap access to the value attribute in a property data descriptor: class Cell: ... @property def value(self): "Recalculate the cell before returning value" self.recalc() return self._value Either the built-in "property()" or our "Property()" equivalent would work in this example. Functions and methods --------------------- Python’s object oriented features are built upon a function based environment. Using non-data descriptors, the two are merged seamlessly. Functions stored in class dictionaries get turned into methods when invoked. Methods only differ from regular functions in that the object instance is prepended to the other arguments. By convention, the instance is called *self* but could be called *this* or any other variable name. Methods can be created manually with "types.MethodType" which is roughly equivalent to: class MethodType: "Emulate PyMethod_Type in Objects/classobject.c" def __init__(self, func, obj): self.__func__ = func self.__self__ = obj def __call__(self, *args, **kwargs): func = self.__func__ obj = self.__self__ return func(obj, *args, **kwargs) def __getattribute__(self, name): "Emulate method_getset() in Objects/classobject.c" if name == '__doc__': return self.__func__.__doc__ return object.__getattribute__(self, name) def __getattr__(self, name): "Emulate method_getattro() in Objects/classobject.c" return getattr(self.__func__, name) def __get__(self, obj, objtype=None): "Emulate method_descr_get() in Objects/classobject.c" return self To support automatic creation of methods, functions include the "__get__()" method for binding methods during attribute access. This means that functions are non-data descriptors that return bound methods during dotted lookup from an instance. Here’s how it works: class Function: ... def __get__(self, obj, objtype=None): "Simulate func_descr_get() in Objects/funcobject.c" if obj is None: return self return MethodType(self, obj) Running the following class in the interpreter shows how the function descriptor works in practice: class D: def f(self): return self class D2: pass The function has a *qualified name* attribute to support introspection: >>> D.f.__qualname__ 'D.f' Accessing the function through the class dictionary does not invoke "__get__()". Instead, it just returns the underlying function object: >>> D.__dict__['f'] Dotted access from a class calls "__get__()" which just returns the underlying function unchanged: >>> D.f The interesting behavior occurs during dotted access from an instance. The dotted lookup calls "__get__()" which returns a bound method object: >>> d = D() >>> d.f > Internally, the bound method stores the underlying function and the bound instance: >>> d.f.__func__ >>> d.f.__self__ <__main__.D object at 0x00B18C90> If you have ever wondered where *self* comes from in regular methods or where *cls* comes from in class methods, this is it! Kinds of methods ---------------- Non-data descriptors provide a simple mechanism for variations on the usual patterns of binding functions into methods. To recap, functions have a "__get__()" method so that they can be converted to a method when accessed as attributes. The non-data descriptor transforms an "obj.f(*args)" call into "f(obj, *args)". Calling "cls.f(*args)" becomes "f(*args)". This chart summarizes the binding and its two most useful variants: +-------------------+------------------------+--------------------+ | Transformation | Called from an object | Called from a | | | | class | |===================|========================|====================| | function | f(obj, *args) | f(*args) | +-------------------+------------------------+--------------------+ | staticmethod | f(*args) | f(*args) | +-------------------+------------------------+--------------------+ | classmethod | f(type(obj), *args) | f(cls, *args) | +-------------------+------------------------+--------------------+ Static methods -------------- Static methods return the underlying function without changes. Calling either "c.f" or "C.f" is the equivalent of a direct lookup into "object.__getattribute__(c, "f")" or "object.__getattribute__(C, "f")". As a result, the function becomes identically accessible from either an object or a class. Good candidates for static methods are methods that do not reference the "self" variable. For instance, a statistics package may include a container class for experimental data. The class provides normal methods for computing the average, mean, median, and other descriptive statistics that depend on the data. However, there may be useful functions which are conceptually related but do not depend on the data. For instance, "erf(x)" is handy conversion routine that comes up in statistical work but does not directly depend on a particular dataset. It can be called either from an object or the class: "s.erf(1.5) --> 0.9332" or "Sample.erf(1.5) --> 0.9332". Since static methods return the underlying function with no changes, the example calls are unexciting: class E: @staticmethod def f(x): return x * 10 >>> E.f(3) 30 >>> E().f(3) 30 Using the non-data descriptor protocol, a pure Python version of "staticmethod()" would look like this: import functools class StaticMethod: "Emulate PyStaticMethod_Type() in Objects/funcobject.c" def __init__(self, f): self.f = f functools.update_wrapper(self, f) def __get__(self, obj, objtype=None): return self.f def __call__(self, *args, **kwds): return self.f(*args, **kwds) The "functools.update_wrapper()" call adds a "__wrapped__" attribute that refers to the underlying function. Also it carries forward the attributes necessary to make the wrapper look like the wrapped function: "__name__", "__qualname__", "__doc__", and "__annotations__". Class methods ------------- Unlike static methods, class methods prepend the class reference to the argument list before calling the function. This format is the same for whether the caller is an object or a class: class F: @classmethod def f(cls, x): return cls.__name__, x >>> F.f(3) ('F', 3) >>> F().f(3) ('F', 3) This behavior is useful whenever the method only needs to have a class reference and does not rely on data stored in a specific instance. One use for class methods is to create alternate class constructors. For example, the classmethod "dict.fromkeys()" creates a new dictionary from a list of keys. The pure Python equivalent is: class Dict(dict): @classmethod def fromkeys(cls, iterable, value=None): "Emulate dict_fromkeys() in Objects/dictobject.c" d = cls() for key in iterable: d[key] = value return d Now a new dictionary of unique keys can be constructed like this: >>> d = Dict.fromkeys('abracadabra') >>> type(d) is Dict True >>> d {'a': None, 'b': None, 'r': None, 'c': None, 'd': None} Using the non-data descriptor protocol, a pure Python version of "classmethod()" would look like this: import functools class ClassMethod: "Emulate PyClassMethod_Type() in Objects/funcobject.c" def __init__(self, f): self.f = f functools.update_wrapper(self, f) def __get__(self, obj, cls=None): if cls is None: cls = type(obj) return MethodType(self.f, cls) The "functools.update_wrapper()" call in "ClassMethod" adds a "__wrapped__" attribute that refers to the underlying function. Also it carries forward the attributes necessary to make the wrapper look like the wrapped function: "__name__", "__qualname__", "__doc__", and "__annotations__". Member objects and __slots__ ---------------------------- When a class defines "__slots__", it replaces instance dictionaries with a fixed-length array of slot values. From a user point of view that has several effects: 1. Provides immediate detection of bugs due to misspelled attribute assignments. Only attribute names specified in "__slots__" are allowed: class Vehicle: __slots__ = ('id_number', 'make', 'model') >>> auto = Vehicle() >>> auto.id_nubmer = 'VYE483814LQEX' Traceback (most recent call last): ... AttributeError: 'Vehicle' object has no attribute 'id_nubmer' 2. Helps create immutable objects where descriptors manage access to private attributes stored in "__slots__": class Immutable: __slots__ = ('_dept', '_name') # Replace the instance dictionary def __init__(self, dept, name): self._dept = dept # Store to private attribute self._name = name # Store to private attribute @property # Read-only descriptor def dept(self): return self._dept @property def name(self): # Read-only descriptor return self._name >>> mark = Immutable('Botany', 'Mark Watney') >>> mark.dept 'Botany' >>> mark.dept = 'Space Pirate' Traceback (most recent call last): ... AttributeError: property 'dept' of 'Immutable' object has no setter >>> mark.location = 'Mars' Traceback (most recent call last): ... AttributeError: 'Immutable' object has no attribute 'location' 3. Saves memory. On a 64-bit Linux build, an instance with two attributes takes 48 bytes with "__slots__" and 152 bytes without. This flyweight design pattern likely only matters when a large number of instances are going to be created. 4. Improves speed. Reading instance variables is 35% faster with "__slots__" (as measured with Python 3.10 on an Apple M1 processor). 5. Blocks tools like "functools.cached_property()" which require an instance dictionary to function correctly: from functools import cached_property class CP: __slots__ = () # Eliminates the instance dict @cached_property # Requires an instance dict def pi(self): return 4 * sum((-1.0)**n / (2.0*n + 1.0) for n in reversed(range(100_000))) >>> CP().pi Traceback (most recent call last): ... TypeError: No '__dict__' attribute on 'CP' instance to cache 'pi' property. It is not possible to create an exact drop-in pure Python version of "__slots__" because it requires direct access to C structures and control over object memory allocation. However, we can build a mostly faithful simulation where the actual C structure for slots is emulated by a private "_slotvalues" list. Reads and writes to that private structure are managed by member descriptors: null = object() class Member: def __init__(self, name, clsname, offset): 'Emulate PyMemberDef in Include/structmember.h' # Also see descr_new() in Objects/descrobject.c self.name = name self.clsname = clsname self.offset = offset def __get__(self, obj, objtype=None): 'Emulate member_get() in Objects/descrobject.c' # Also see PyMember_GetOne() in Python/structmember.c if obj is None: return self value = obj._slotvalues[self.offset] if value is null: raise AttributeError(self.name) return value def __set__(self, obj, value): 'Emulate member_set() in Objects/descrobject.c' obj._slotvalues[self.offset] = value def __delete__(self, obj): 'Emulate member_delete() in Objects/descrobject.c' value = obj._slotvalues[self.offset] if value is null: raise AttributeError(self.name) obj._slotvalues[self.offset] = null def __repr__(self): 'Emulate member_repr() in Objects/descrobject.c' return f'' The "type.__new__()" method takes care of adding member objects to class variables: class Type(type): 'Simulate how the type metaclass adds member objects for slots' def __new__(mcls, clsname, bases, mapping, **kwargs): 'Emulate type_new() in Objects/typeobject.c' # type_new() calls PyTypeReady() which calls add_methods() slot_names = mapping.get('slot_names', []) for offset, name in enumerate(slot_names): mapping[name] = Member(name, clsname, offset) return type.__new__(mcls, clsname, bases, mapping, **kwargs) The "object.__new__()" method takes care of creating instances that have slots instead of an instance dictionary. Here is a rough simulation in pure Python: class Object: 'Simulate how object.__new__() allocates memory for __slots__' def __new__(cls, *args, **kwargs): 'Emulate object_new() in Objects/typeobject.c' inst = super().__new__(cls) if hasattr(cls, 'slot_names'): empty_slots = [null] * len(cls.slot_names) object.__setattr__(inst, '_slotvalues', empty_slots) return inst def __setattr__(self, name, value): 'Emulate _PyObject_GenericSetAttrWithDict() Objects/object.c' cls = type(self) if hasattr(cls, 'slot_names') and name not in cls.slot_names: raise AttributeError( f'{cls.__name__!r} object has no attribute {name!r}' ) super().__setattr__(name, value) def __delattr__(self, name): 'Emulate _PyObject_GenericSetAttrWithDict() Objects/object.c' cls = type(self) if hasattr(cls, 'slot_names') and name not in cls.slot_names: raise AttributeError( f'{cls.__name__!r} object has no attribute {name!r}' ) super().__delattr__(name) To use the simulation in a real class, just inherit from "Object" and set the *metaclass* to "Type": class H(Object, metaclass=Type): 'Instance variables stored in slots' slot_names = ['x', 'y'] def __init__(self, x, y): self.x = x self.y = y At this point, the metaclass has loaded member objects for *x* and *y*: >>> from pprint import pp >>> pp(dict(vars(H))) {'__module__': '__main__', '__doc__': 'Instance variables stored in slots', 'slot_names': ['x', 'y'], '__init__': , 'x': , 'y': } When instances are created, they have a "slot_values" list where the attributes are stored: >>> h = H(10, 20) >>> vars(h) {'_slotvalues': [10, 20]} >>> h.x = 55 >>> vars(h) {'_slotvalues': [55, 20]} Misspelled or unassigned attributes will raise an exception: >>> h.xz Traceback (most recent call last): ... AttributeError: 'H' object has no attribute 'xz' Enum HOWTO ********** An "Enum" is a set of symbolic names bound to unique values. They are similar to global variables, but they offer a more useful "repr()", grouping, type-safety, and a few other features. They are most useful when you have a variable that can take one of a limited selection of values. For example, the days of the week: >>> from enum import Enum >>> class Weekday(Enum): ... MONDAY = 1 ... TUESDAY = 2 ... WEDNESDAY = 3 ... THURSDAY = 4 ... FRIDAY = 5 ... SATURDAY = 6 ... SUNDAY = 7 Or perhaps the RGB primary colors: >>> from enum import Enum >>> class Color(Enum): ... RED = 1 ... GREEN = 2 ... BLUE = 3 As you can see, creating an "Enum" is as simple as writing a class that inherits from "Enum" itself. Note: Case of Enum MembersBecause Enums are used to represent constants, and to help avoid issues with name clashes between mixin-class methods/attributes and enum names, we strongly recommend using UPPER_CASE names for members, and will be using that style in our examples. Depending on the nature of the enum a member’s value may or may not be important, but either way that value can be used to get the corresponding member: >>> Weekday(3) As you can see, the "repr()" of a member shows the enum name, the member name, and the value. The "str()" of a member shows only the enum name and member name: >>> print(Weekday.THURSDAY) Weekday.THURSDAY The *type* of an enumeration member is the enum it belongs to: >>> type(Weekday.MONDAY) >>> isinstance(Weekday.FRIDAY, Weekday) True Enum members have an attribute that contains just their "name": >>> print(Weekday.TUESDAY.name) TUESDAY Likewise, they have an attribute for their "value": >>> Weekday.WEDNESDAY.value 3 Unlike many languages that treat enumerations solely as name/value pairs, Python Enums can have behavior added. For example, "datetime.date" has two methods for returning the weekday: "weekday()" and "isoweekday()". The difference is that one of them counts from 0-6 and the other from 1-7. Rather than keep track of that ourselves we can add a method to the "Weekday" enum to extract the day from the "date" instance and return the matching enum member: @classmethod def from_date(cls, date): return cls(date.isoweekday()) The complete "Weekday" enum now looks like this: >>> class Weekday(Enum): ... MONDAY = 1 ... TUESDAY = 2 ... WEDNESDAY = 3 ... THURSDAY = 4 ... FRIDAY = 5 ... SATURDAY = 6 ... SUNDAY = 7 ... # ... @classmethod ... def from_date(cls, date): ... return cls(date.isoweekday()) Now we can find out what today is! Observe: >>> from datetime import date >>> Weekday.from_date(date.today()) Of course, if you’re reading this on some other day, you’ll see that day instead. This "Weekday" enum is great if our variable only needs one day, but what if we need several? Maybe we’re writing a function to plot chores during a week, and don’t want to use a "list" – we could use a different type of "Enum": >>> from enum import Flag >>> class Weekday(Flag): ... MONDAY = 1 ... TUESDAY = 2 ... WEDNESDAY = 4 ... THURSDAY = 8 ... FRIDAY = 16 ... SATURDAY = 32 ... SUNDAY = 64 We’ve changed two things: we’re inherited from "Flag", and the values are all powers of 2. Just like the original "Weekday" enum above, we can have a single selection: >>> first_week_day = Weekday.MONDAY >>> first_week_day But "Flag" also allows us to combine several members into a single variable: >>> weekend = Weekday.SATURDAY | Weekday.SUNDAY >>> weekend You can even iterate over a "Flag" variable: >>> for day in weekend: ... print(day) Weekday.SATURDAY Weekday.SUNDAY Okay, let’s get some chores set up: >>> chores_for_ethan = { ... 'feed the cat': Weekday.MONDAY | Weekday.WEDNESDAY | Weekday.FRIDAY, ... 'do the dishes': Weekday.TUESDAY | Weekday.THURSDAY, ... 'answer SO questions': Weekday.SATURDAY, ... } And a function to display the chores for a given day: >>> def show_chores(chores, day): ... for chore, days in chores.items(): ... if day in days: ... print(chore) ... >>> show_chores(chores_for_ethan, Weekday.SATURDAY) answer SO questions In cases where the actual values of the members do not matter, you can save yourself some work and use "auto()" for the values: >>> from enum import auto >>> class Weekday(Flag): ... MONDAY = auto() ... TUESDAY = auto() ... WEDNESDAY = auto() ... THURSDAY = auto() ... FRIDAY = auto() ... SATURDAY = auto() ... SUNDAY = auto() ... WEEKEND = SATURDAY | SUNDAY Programmatic access to enumeration members and their attributes =============================================================== Sometimes it’s useful to access members in enumerations programmatically (i.e. situations where "Color.RED" won’t do because the exact color is not known at program-writing time). "Enum" allows such access: >>> Color(1) >>> Color(3) If you want to access enum members by *name*, use item access: >>> Color['RED'] >>> Color['GREEN'] If you have an enum member and need its "name" or "value": >>> member = Color.RED >>> member.name 'RED' >>> member.value 1 Duplicating enum members and values =================================== Having two enum members with the same name is invalid: >>> class Shape(Enum): ... SQUARE = 2 ... SQUARE = 3 ... Traceback (most recent call last): ... TypeError: 'SQUARE' already defined as 2 However, an enum member can have other names associated with it. Given two entries "A" and "B" with the same value (and "A" defined first), "B" is an alias for the member "A". By-value lookup of the value of "A" will return the member "A". By-name lookup of "A" will return the member "A". By-name lookup of "B" will also return the member "A": >>> class Shape(Enum): ... SQUARE = 2 ... DIAMOND = 1 ... CIRCLE = 3 ... ALIAS_FOR_SQUARE = 2 ... >>> Shape.SQUARE >>> Shape.ALIAS_FOR_SQUARE >>> Shape(2) Note: Attempting to create a member with the same name as an already defined attribute (another member, a method, etc.) or attempting to create an attribute with the same name as a member is not allowed. Ensuring unique enumeration values ================================== By default, enumerations allow multiple names as aliases for the same value. When this behavior isn’t desired, you can use the "unique()" decorator: >>> from enum import Enum, unique >>> @unique ... class Mistake(Enum): ... ONE = 1 ... TWO = 2 ... THREE = 3 ... FOUR = 3 ... Traceback (most recent call last): ... ValueError: duplicate values found in : FOUR -> THREE Using automatic values ====================== If the exact value is unimportant you can use "auto": >>> from enum import Enum, auto >>> class Color(Enum): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> [member.value for member in Color] [1, 2, 3] The values are chosen by "_generate_next_value_()", which can be overridden: >>> class AutoName(Enum): ... @staticmethod ... def _generate_next_value_(name, start, count, last_values): ... return name ... >>> class Ordinal(AutoName): ... NORTH = auto() ... SOUTH = auto() ... EAST = auto() ... WEST = auto() ... >>> [member.value for member in Ordinal] ['NORTH', 'SOUTH', 'EAST', 'WEST'] Note: The "_generate_next_value_()" method must be defined before any members. Iteration ========= Iterating over the members of an enum does not provide the aliases: >>> list(Shape) [, , ] >>> list(Weekday) [, , , , , , ] Note that the aliases "Shape.ALIAS_FOR_SQUARE" and "Weekday.WEEKEND" aren’t shown. The special attribute "__members__" is a read-only ordered mapping of names to members. It includes all names defined in the enumeration, including the aliases: >>> for name, member in Shape.__members__.items(): ... name, member ... ('SQUARE', ) ('DIAMOND', ) ('CIRCLE', ) ('ALIAS_FOR_SQUARE', ) The "__members__" attribute can be used for detailed programmatic access to the enumeration members. For example, finding all the aliases: >>> [name for name, member in Shape.__members__.items() if member.name != name] ['ALIAS_FOR_SQUARE'] Note: Aliases for flags include values with multiple flags set, such as "3", and no flags set, i.e. "0". Comparisons =========== Enumeration members are compared by identity: >>> Color.RED is Color.RED True >>> Color.RED is Color.BLUE False >>> Color.RED is not Color.BLUE True Ordered comparisons between enumeration values are *not* supported. Enum members are not integers (but see IntEnum below): >>> Color.RED < Color.BLUE Traceback (most recent call last): File "", line 1, in TypeError: '<' not supported between instances of 'Color' and 'Color' Equality comparisons are defined though: >>> Color.BLUE == Color.RED False >>> Color.BLUE != Color.RED True >>> Color.BLUE == Color.BLUE True Comparisons against non-enumeration values will always compare not equal (again, "IntEnum" was explicitly designed to behave differently, see below): >>> Color.BLUE == 2 False Warning: It is possible to reload modules – if a reloaded module contains enums, they will be recreated, and the new members may not compare identical/equal to the original members. Allowed members and attributes of enumerations ============================================== Most of the examples above use integers for enumeration values. Using integers is short and handy (and provided by default by the Functional API), but not strictly enforced. In the vast majority of use-cases, one doesn’t care what the actual value of an enumeration is. But if the value *is* important, enumerations can have arbitrary values. Enumerations are Python classes, and can have methods and special methods as usual. If we have this enumeration: >>> class Mood(Enum): ... FUNKY = 1 ... HAPPY = 3 ... ... def describe(self): ... # self is the member here ... return self.name, self.value ... ... def __str__(self): ... return 'my custom str! {0}'.format(self.value) ... ... @classmethod ... def favorite_mood(cls): ... # cls here is the enumeration ... return cls.HAPPY ... Then: >>> Mood.favorite_mood() >>> Mood.HAPPY.describe() ('HAPPY', 3) >>> str(Mood.FUNKY) 'my custom str! 1' The rules for what is allowed are as follows: names that start and end with a single underscore are reserved by enum and cannot be used; all other attributes defined within an enumeration will become members of this enumeration, with the exception of special methods ("__str__()", "__add__()", etc.), descriptors (methods are also descriptors), and variable names listed in "_ignore_". Note: if your enumeration defines "__new__()" and/or "__init__()", any value(s) given to the enum member will be passed into those methods. See Planet for an example. Note: The "__new__()" method, if defined, is used during creation of the Enum members; it is then replaced by Enum’s "__new__()" which is used after class creation for lookup of existing members. See When to use __new__() vs. __init__() for more details. Restricted Enum subclassing =========================== A new "Enum" class must have one base enum class, up to one concrete data type, and as many "object"-based mixin classes as needed. The order of these base classes is: class EnumName([mix-in, ...,] [data-type,] base-enum): pass Also, subclassing an enumeration is allowed only if the enumeration does not define any members. So this is forbidden: >>> class MoreColor(Color): ... PINK = 17 ... Traceback (most recent call last): ... TypeError: cannot extend But this is allowed: >>> class Foo(Enum): ... def some_behavior(self): ... pass ... >>> class Bar(Foo): ... HAPPY = 1 ... SAD = 2 ... Allowing subclassing of enums that define members would lead to a violation of some important invariants of types and instances. On the other hand, it makes sense to allow sharing some common behavior between a group of enumerations. (See OrderedEnum for an example.) Dataclass support ================= When inheriting from a "dataclass", the "__repr__()" omits the inherited class’ name. For example: >>> from dataclasses import dataclass, field >>> @dataclass ... class CreatureDataMixin: ... size: str ... legs: int ... tail: bool = field(repr=False, default=True) ... >>> class Creature(CreatureDataMixin, Enum): ... BEETLE = 'small', 6 ... DOG = 'medium', 4 ... >>> Creature.DOG Use the "dataclass()" argument "repr=False" to use the standard "repr()". Changed in version 3.12: Only the dataclass fields are shown in the value area, not the dataclass’ name. Note: Adding "dataclass()" decorator to "Enum" and its subclasses is not supported. It will not raise any errors, but it will produce very strange results at runtime, such as members being equal to each other: >>> @dataclass # don't do this: it does not make any sense ... class Color(Enum): ... RED = 1 ... BLUE = 2 ... >>> Color.RED is Color.BLUE False >>> Color.RED == Color.BLUE # problem is here: they should not be equal True Pickling ======== Enumerations can be pickled and unpickled: >>> from test.test_enum import Fruit >>> from pickle import dumps, loads >>> Fruit.TOMATO is loads(dumps(Fruit.TOMATO)) True The usual restrictions for pickling apply: picklable enums must be defined in the top level of a module, since unpickling requires them to be importable from that module. Note: With pickle protocol version 4 it is possible to easily pickle enums nested in other classes. It is possible to modify how enum members are pickled/unpickled by defining "__reduce_ex__()" in the enumeration class. The default method is by-value, but enums with complicated values may want to use by-name: >>> import enum >>> class MyEnum(enum.Enum): ... __reduce_ex__ = enum.pickle_by_enum_name Note: Using by-name for flags is not recommended, as unnamed aliases will not unpickle. Functional API ============== The "Enum" class is callable, providing the following functional API: >>> Animal = Enum('Animal', 'ANT BEE CAT DOG') >>> Animal >>> Animal.ANT >>> list(Animal) [, , , ] The semantics of this API resemble "namedtuple". The first argument of the call to "Enum" is the name of the enumeration. The second argument is the *source* of enumeration member names. It can be a whitespace-separated string of names, a sequence of names, a sequence of 2-tuples with key/value pairs, or a mapping (e.g. dictionary) of names to values. The last two options enable assigning arbitrary values to enumerations; the others auto-assign increasing integers starting with 1 (use the "start" parameter to specify a different starting value). A new class derived from "Enum" is returned. In other words, the above assignment to "Animal" is equivalent to: >>> class Animal(Enum): ... ANT = 1 ... BEE = 2 ... CAT = 3 ... DOG = 4 ... The reason for defaulting to "1" as the starting number and not "0" is that "0" is "False" in a boolean sense, but by default enum members all evaluate to "True". Pickling enums created with the functional API can be tricky as frame stack implementation details are used to try and figure out which module the enumeration is being created in (e.g. it will fail if you use a utility function in a separate module, and also may not work on IronPython or Jython). The solution is to specify the module name explicitly as follows: >>> Animal = Enum('Animal', 'ANT BEE CAT DOG', module=__name__) Warning: If "module" is not supplied, and Enum cannot determine what it is, the new Enum members will not be unpicklable; to keep errors closer to the source, pickling will be disabled. The new pickle protocol 4 also, in some circumstances, relies on "__qualname__" being set to the location where pickle will be able to find the class. For example, if the class was made available in class SomeData in the global scope: >>> Animal = Enum('Animal', 'ANT BEE CAT DOG', qualname='SomeData.Animal') The complete signature is: Enum( value='NewEnumName', names=<...>, *, module='...', qualname='...', type=, start=1, ) * *value*: What the new enum class will record as its name. * *names*: The enum members. This can be a whitespace- or comma- separated string (values will start at 1 unless otherwise specified): 'RED GREEN BLUE' | 'RED,GREEN,BLUE' | 'RED, GREEN, BLUE' or an iterator of names: ['RED', 'GREEN', 'BLUE'] or an iterator of (name, value) pairs: [('CYAN', 4), ('MAGENTA', 5), ('YELLOW', 6)] or a mapping: {'CHARTREUSE': 7, 'SEA_GREEN': 11, 'ROSEMARY': 42} * *module*: name of module where new enum class can be found. * *qualname*: where in module new enum class can be found. * *type*: type to mix in to new enum class. * *start*: number to start counting at if only names are passed in. Changed in version 3.5: The *start* parameter was added. Derived Enumerations ==================== IntEnum ------- The first variation of "Enum" that is provided is also a subclass of "int". Members of an "IntEnum" can be compared to integers; by extension, integer enumerations of different types can also be compared to each other: >>> from enum import IntEnum >>> class Shape(IntEnum): ... CIRCLE = 1 ... SQUARE = 2 ... >>> class Request(IntEnum): ... POST = 1 ... GET = 2 ... >>> Shape == 1 False >>> Shape.CIRCLE == 1 True >>> Shape.CIRCLE == Request.POST True However, they still can’t be compared to standard "Enum" enumerations: >>> class Shape(IntEnum): ... CIRCLE = 1 ... SQUARE = 2 ... >>> class Color(Enum): ... RED = 1 ... GREEN = 2 ... >>> Shape.CIRCLE == Color.RED False "IntEnum" values behave like integers in other ways you’d expect: >>> int(Shape.CIRCLE) 1 >>> ['a', 'b', 'c'][Shape.CIRCLE] 'b' >>> [i for i in range(Shape.SQUARE)] [0, 1] StrEnum ------- The second variation of "Enum" that is provided is also a subclass of "str". Members of a "StrEnum" can be compared to strings; by extension, string enumerations of different types can also be compared to each other. Added in version 3.11. IntFlag ------- The next variation of "Enum" provided, "IntFlag", is also based on "int". The difference being "IntFlag" members can be combined using the bitwise operators (&, |, ^, ~) and the result is still an "IntFlag" member, if possible. Like "IntEnum", "IntFlag" members are also integers and can be used wherever an "int" is used. Note: Any operation on an "IntFlag" member besides the bit-wise operations will lose the "IntFlag" membership.Bit-wise operations that result in invalid "IntFlag" values will lose the "IntFlag" membership. See "FlagBoundary" for details. Added in version 3.6. Changed in version 3.11. Sample "IntFlag" class: >>> from enum import IntFlag >>> class Perm(IntFlag): ... R = 4 ... W = 2 ... X = 1 ... >>> Perm.R | Perm.W >>> Perm.R + Perm.W 6 >>> RW = Perm.R | Perm.W >>> Perm.R in RW True It is also possible to name the combinations: >>> class Perm(IntFlag): ... R = 4 ... W = 2 ... X = 1 ... RWX = 7 ... >>> Perm.RWX >>> ~Perm.RWX >>> Perm(7) Note: Named combinations are considered aliases. Aliases do not show up during iteration, but can be returned from by-value lookups. Changed in version 3.11. Another important difference between "IntFlag" and "Enum" is that if no flags are set (the value is 0), its boolean evaluation is "False": >>> Perm.R & Perm.X >>> bool(Perm.R & Perm.X) False Because "IntFlag" members are also subclasses of "int" they can be combined with them (but may lose "IntFlag" membership: >>> Perm.X | 4 >>> Perm.X + 8 9 Note: The negation operator, "~", always returns an "IntFlag" member with a positive value: >>> (~Perm.X).value == (Perm.R|Perm.W).value == 6 True "IntFlag" members can also be iterated over: >>> list(RW) [, ] Added in version 3.11. Flag ---- The last variation is "Flag". Like "IntFlag", "Flag" members can be combined using the bitwise operators (&, |, ^, ~). Unlike "IntFlag", they cannot be combined with, nor compared against, any other "Flag" enumeration, nor "int". While it is possible to specify the values directly it is recommended to use "auto" as the value and let "Flag" select an appropriate value. Added in version 3.6. Like "IntFlag", if a combination of "Flag" members results in no flags being set, the boolean evaluation is "False": >>> from enum import Flag, auto >>> class Color(Flag): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> Color.RED & Color.GREEN >>> bool(Color.RED & Color.GREEN) False Individual flags should have values that are powers of two (1, 2, 4, 8, …), while combinations of flags will not: >>> class Color(Flag): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... WHITE = RED | BLUE | GREEN ... >>> Color.WHITE Giving a name to the “no flags set” condition does not change its boolean value: >>> class Color(Flag): ... BLACK = 0 ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> Color.BLACK >>> bool(Color.BLACK) False "Flag" members can also be iterated over: >>> purple = Color.RED | Color.BLUE >>> list(purple) [, ] Added in version 3.11. Note: For the majority of new code, "Enum" and "Flag" are strongly recommended, since "IntEnum" and "IntFlag" break some semantic promises of an enumeration (by being comparable to integers, and thus by transitivity to other unrelated enumerations). "IntEnum" and "IntFlag" should be used only in cases where "Enum" and "Flag" will not do; for example, when integer constants are replaced with enumerations, or for interoperability with other systems. Others ------ While "IntEnum" is part of the "enum" module, it would be very simple to implement independently: class IntEnum(int, ReprEnum): # or Enum instead of ReprEnum pass This demonstrates how similar derived enumerations can be defined; for example a "FloatEnum" that mixes in "float" instead of "int". Some rules: 1. When subclassing "Enum", mix-in types must appear before the "Enum" class itself in the sequence of bases, as in the "IntEnum" example above. 2. Mix-in types must be subclassable. For example, "bool" and "range" are not subclassable and will throw an error during Enum creation if used as the mix-in type. 3. While "Enum" can have members of any type, once you mix in an additional type, all the members must have values of that type, e.g. "int" above. This restriction does not apply to mix-ins which only add methods and don’t specify another type. 4. When another data type is mixed in, the "value" attribute is *not the same* as the enum member itself, although it is equivalent and will compare equal. 5. A "data type" is a mixin that defines "__new__()", or a "dataclass" 6. %-style formatting: "%s" and "%r" call the "Enum" class’s "__str__()" and "__repr__()" respectively; other codes (such as "%i" or "%h" for IntEnum) treat the enum member as its mixed-in type. 7. Formatted string literals, "str.format()", and "format()" will use the enum’s "__str__()" method. Note: Because "IntEnum", "IntFlag", and "StrEnum" are designed to be drop- in replacements for existing constants, their "__str__()" method has been reset to their data types’ "__str__()" method. When to use "__new__()" vs. "__init__()" ======================================== "__new__()" must be used whenever you want to customize the actual value of the "Enum" member. Any other modifications may go in either "__new__()" or "__init__()", with "__init__()" being preferred. For example, if you want to pass several items to the constructor, but only want one of them to be the value: >>> class Coordinate(bytes, Enum): ... """ ... Coordinate with binary codes that can be indexed by the int code. ... """ ... def __new__(cls, value, label, unit): ... obj = bytes.__new__(cls, [value]) ... obj._value_ = value ... obj.label = label ... obj.unit = unit ... return obj ... PX = (0, 'P.X', 'km') ... PY = (1, 'P.Y', 'km') ... VX = (2, 'V.X', 'km/s') ... VY = (3, 'V.Y', 'km/s') ... >>> print(Coordinate['PY']) Coordinate.PY >>> print(Coordinate(3)) Coordinate.VY Warning: *Do not* call "super().__new__()", as the lookup-only "__new__" is the one that is found; instead, use the data type directly. Finer Points ------------ Supported "__dunder__" names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "__members__" is a read-only ordered mapping of "member_name":"member" items. It is only available on the class. "__new__()", if specified, must create and return the enum members; it is also a very good idea to set the member’s "_value_" appropriately. Once all the members are created it is no longer used. Supported "_sunder_" names ~~~~~~~~~~~~~~~~~~~~~~~~~~ * "_name_" – name of the member * "_value_" – value of the member; can be set in "__new__" * "_missing_()" – a lookup function used when a value is not found; may be overridden * "_ignore_" – a list of names, either as a "list" or a "str", that will not be transformed into members, and will be removed from the final class * "_generate_next_value_()" – used to get an appropriate value for an enum member; may be overridden * "_add_alias_()" – adds a new name as an alias to an existing member. * "_add_value_alias_()" – adds a new value as an alias to an existing member. See MultiValueEnum for an example. Note: For standard "Enum" classes the next value chosen is the highest value seen incremented by one.For "Flag" classes the next value chosen will be the next highest power-of-two. Changed in version 3.13: Prior versions would use the last seen value instead of the highest value. Added in version 3.6: "_missing_", "_order_", "_generate_next_value_" Added in version 3.7: "_ignore_" Added in version 3.13: "_add_alias_", "_add_value_alias_" To help keep Python 2 / Python 3 code in sync an "_order_" attribute can be provided. It will be checked against the actual order of the enumeration and raise an error if the two do not match: >>> class Color(Enum): ... _order_ = 'RED GREEN BLUE' ... RED = 1 ... BLUE = 3 ... GREEN = 2 ... Traceback (most recent call last): ... TypeError: member order does not match _order_: ['RED', 'BLUE', 'GREEN'] ['RED', 'GREEN', 'BLUE'] Note: In Python 2 code the "_order_" attribute is necessary as definition order is lost before it can be recorded. _Private__names ~~~~~~~~~~~~~~~ Private names are not converted to enum members, but remain normal attributes. Changed in version 3.11. "Enum" member type ~~~~~~~~~~~~~~~~~~ Enum members are instances of their enum class, and are normally accessed as "EnumClass.member". In certain situations, such as writing custom enum behavior, being able to access one member directly from another is useful, and is supported; however, in order to avoid name clashes between member names and attributes/methods from mixed-in classes, upper-case names are strongly recommended. Changed in version 3.5. Creating members that are mixed with other data types ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When subclassing other data types, such as "int" or "str", with an "Enum", all values after the "=" are passed to that data type’s constructor. For example: >>> class MyEnum(IntEnum): # help(int) -> int(x, base=10) -> integer ... example = '11', 16 # so x='11' and base=16 ... >>> MyEnum.example.value # and hex(11) is... 17 Boolean value of "Enum" classes and members ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enum classes that are mixed with non-"Enum" types (such as "int", "str", etc.) are evaluated according to the mixed-in type’s rules; otherwise, all members evaluate as "True". To make your own enum’s boolean evaluation depend on the member’s value add the following to your class: def __bool__(self): return bool(self.value) Plain "Enum" classes always evaluate as "True". "Enum" classes with methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you give your enum subclass extra methods, like the Planet class below, those methods will show up in a "dir()" of the member, but not of the class: >>> dir(Planet) ['EARTH', 'JUPITER', 'MARS', 'MERCURY', 'NEPTUNE', 'SATURN', 'URANUS', 'VENUS', '__class__', '__doc__', '__members__', '__module__'] >>> dir(Planet.EARTH) ['__class__', '__doc__', '__module__', 'mass', 'name', 'radius', 'surface_gravity', 'value'] Combining members of "Flag" ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Iterating over a combination of "Flag" members will only return the members that are comprised of a single bit: >>> class Color(Flag): ... RED = auto() ... GREEN = auto() ... BLUE = auto() ... MAGENTA = RED | BLUE ... YELLOW = RED | GREEN ... CYAN = GREEN | BLUE ... >>> Color(3) # named combination >>> Color(7) # not named combination "Flag" and "IntFlag" minutia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using the following snippet for our examples: >>> class Color(IntFlag): ... BLACK = 0 ... RED = 1 ... GREEN = 2 ... BLUE = 4 ... PURPLE = RED | BLUE ... WHITE = RED | GREEN | BLUE ... the following are true: * single-bit flags are canonical * multi-bit and zero-bit flags are aliases * only canonical flags are returned during iteration: >>> list(Color.WHITE) [, , ] * negating a flag or flag set returns a new flag/flag set with the corresponding positive integer value: >>> Color.BLUE >>> ~Color.BLUE * names of pseudo-flags are constructed from their members’ names: >>> (Color.RED | Color.GREEN).name 'RED|GREEN' >>> class Perm(IntFlag): ... R = 4 ... W = 2 ... X = 1 ... >>> (Perm.R & Perm.W).name is None # effectively Perm(0) True * multi-bit flags, aka aliases, can be returned from operations: >>> Color.RED | Color.BLUE >>> Color(7) # or Color(-1) >>> Color(0) * membership / containment checking: zero-valued flags are always considered to be contained: >>> Color.BLACK in Color.WHITE True otherwise, only if all bits of one flag are in the other flag will True be returned: >>> Color.PURPLE in Color.WHITE True >>> Color.GREEN in Color.PURPLE False There is a new boundary mechanism that controls how out-of-range / invalid bits are handled: "STRICT", "CONFORM", "EJECT", and "KEEP": * STRICT –> raises an exception when presented with invalid values * CONFORM –> discards any invalid bits * EJECT –> lose Flag status and become a normal int with the given value * KEEP –> keep the extra bits * keeps Flag status and extra bits * extra bits do not show up in iteration * extra bits do show up in repr() and str() The default for Flag is "STRICT", the default for "IntFlag" is "EJECT", and the default for "_convert_" is "KEEP" (see "ssl.Options" for an example of when "KEEP" is needed). How are Enums and Flags different? ================================== Enums have a custom metaclass that affects many aspects of both derived "Enum" classes and their instances (members). Enum Classes ------------ The "EnumType" metaclass is responsible for providing the "__contains__()", "__dir__()", "__iter__()" and other methods that allow one to do things with an "Enum" class that fail on a typical class, such as "list(Color)" or "some_enum_var in Color". "EnumType" is responsible for ensuring that various other methods on the final "Enum" class are correct (such as "__new__()", "__getnewargs__()", "__str__()" and "__repr__()"). Flag Classes ------------ Flags have an expanded view of aliasing: to be canonical, the value of a flag needs to be a power-of-two value, and not a duplicate name. So, in addition to the "Enum" definition of alias, a flag with no value (a.k.a. "0") or with more than one power-of-two value (e.g. "3") is considered an alias. Enum Members (aka instances) ---------------------------- The most interesting thing about enum members is that they are singletons. "EnumType" creates them all while it is creating the enum class itself, and then puts a custom "__new__()" in place to ensure that no new ones are ever instantiated by returning only the existing member instances. Flag Members ------------ Flag members can be iterated over just like the "Flag" class, and only the canonical members will be returned. For example: >>> list(Color) [, , ] (Note that "BLACK", "PURPLE", and "WHITE" do not show up.) Inverting a flag member returns the corresponding positive value, rather than a negative value — for example: >>> ~Color.RED Flag members have a length corresponding to the number of power-of-two values they contain. For example: >>> len(Color.PURPLE) 2 Enum Cookbook ============= While "Enum", "IntEnum", "StrEnum", "Flag", and "IntFlag" are expected to cover the majority of use-cases, they cannot cover them all. Here are recipes for some different types of enumerations that can be used directly, or as examples for creating one’s own. Omitting values --------------- In many use-cases, one doesn’t care what the actual value of an enumeration is. There are several ways to define this type of simple enumeration: * use instances of "auto" for the value * use instances of "object" as the value * use a descriptive string as the value * use a tuple as the value and a custom "__new__()" to replace the tuple with an "int" value Using any of these methods signifies to the user that these values are not important, and also enables one to add, remove, or reorder members without having to renumber the remaining members. Using "auto" ~~~~~~~~~~~~ Using "auto" would look like: >>> class Color(Enum): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> Color.GREEN Using "object" ~~~~~~~~~~~~~~ Using "object" would look like: >>> class Color(Enum): ... RED = object() ... GREEN = object() ... BLUE = object() ... >>> Color.GREEN > This is also a good example of why you might want to write your own "__repr__()": >>> class Color(Enum): ... RED = object() ... GREEN = object() ... BLUE = object() ... def __repr__(self): ... return "<%s.%s>" % (self.__class__.__name__, self._name_) ... >>> Color.GREEN Using a descriptive string ~~~~~~~~~~~~~~~~~~~~~~~~~~ Using a string as the value would look like: >>> class Color(Enum): ... RED = 'stop' ... GREEN = 'go' ... BLUE = 'too fast!' ... >>> Color.GREEN Using a custom "__new__()" ~~~~~~~~~~~~~~~~~~~~~~~~~~ Using an auto-numbering "__new__()" would look like: >>> class AutoNumber(Enum): ... def __new__(cls): ... value = len(cls.__members__) + 1 ... obj = object.__new__(cls) ... obj._value_ = value ... return obj ... >>> class Color(AutoNumber): ... RED = () ... GREEN = () ... BLUE = () ... >>> Color.GREEN To make a more general purpose "AutoNumber", add "*args" to the signature: >>> class AutoNumber(Enum): ... def __new__(cls, *args): # this is the only change from above ... value = len(cls.__members__) + 1 ... obj = object.__new__(cls) ... obj._value_ = value ... return obj ... Then when you inherit from "AutoNumber" you can write your own "__init__" to handle any extra arguments: >>> class Swatch(AutoNumber): ... def __init__(self, pantone='unknown'): ... self.pantone = pantone ... AUBURN = '3497' ... SEA_GREEN = '1246' ... BLEACHED_CORAL = () # New color, no Pantone code yet! ... >>> Swatch.SEA_GREEN >>> Swatch.SEA_GREEN.pantone '1246' >>> Swatch.BLEACHED_CORAL.pantone 'unknown' Note: The "__new__()" method, if defined, is used during creation of the Enum members; it is then replaced by Enum’s "__new__()" which is used after class creation for lookup of existing members. Warning: *Do not* call "super().__new__()", as the lookup-only "__new__" is the one that is found; instead, use the data type directly – e.g.: obj = int.__new__(cls, value) OrderedEnum ----------- An ordered enumeration that is not based on "IntEnum" and so maintains the normal "Enum" invariants (such as not being comparable to other enumerations): >>> class OrderedEnum(Enum): ... def __ge__(self, other): ... if self.__class__ is other.__class__: ... return self.value >= other.value ... return NotImplemented ... def __gt__(self, other): ... if self.__class__ is other.__class__: ... return self.value > other.value ... return NotImplemented ... def __le__(self, other): ... if self.__class__ is other.__class__: ... return self.value <= other.value ... return NotImplemented ... def __lt__(self, other): ... if self.__class__ is other.__class__: ... return self.value < other.value ... return NotImplemented ... >>> class Grade(OrderedEnum): ... A = 5 ... B = 4 ... C = 3 ... D = 2 ... F = 1 ... >>> Grade.C < Grade.A True DuplicateFreeEnum ----------------- Raises an error if a duplicate member value is found instead of creating an alias: >>> class DuplicateFreeEnum(Enum): ... def __init__(self, *args): ... cls = self.__class__ ... if any(self.value == e.value for e in cls): ... a = self.name ... e = cls(self.value).name ... raise ValueError( ... "aliases not allowed in DuplicateFreeEnum: %r --> %r" ... % (a, e)) ... >>> class Color(DuplicateFreeEnum): ... RED = 1 ... GREEN = 2 ... BLUE = 3 ... GRENE = 2 ... Traceback (most recent call last): ... ValueError: aliases not allowed in DuplicateFreeEnum: 'GRENE' --> 'GREEN' Note: This is a useful example for subclassing Enum to add or change other behaviors as well as disallowing aliases. If the only desired change is disallowing aliases, the "unique()" decorator can be used instead. MultiValueEnum -------------- Supports having more than one value per member: >>> class MultiValueEnum(Enum): ... def __new__(cls, value, *values): ... self = object.__new__(cls) ... self._value_ = value ... for v in values: ... self._add_value_alias_(v) ... return self ... >>> class DType(MultiValueEnum): ... float32 = 'f', 8 ... double64 = 'd', 9 ... >>> DType('f') >>> DType(9) Planet ------ If "__new__()" or "__init__()" is defined, the value of the enum member will be passed to those methods: >>> class Planet(Enum): ... MERCURY = (3.303e+23, 2.4397e6) ... VENUS = (4.869e+24, 6.0518e6) ... EARTH = (5.976e+24, 6.37814e6) ... MARS = (6.421e+23, 3.3972e6) ... JUPITER = (1.9e+27, 7.1492e7) ... SATURN = (5.688e+26, 6.0268e7) ... URANUS = (8.686e+25, 2.5559e7) ... NEPTUNE = (1.024e+26, 2.4746e7) ... def __init__(self, mass, radius): ... self.mass = mass # in kilograms ... self.radius = radius # in meters ... @property ... def surface_gravity(self): ... # universal gravitational constant (m3 kg-1 s-2) ... G = 6.67300E-11 ... return G * self.mass / (self.radius * self.radius) ... >>> Planet.EARTH.value (5.976e+24, 6378140.0) >>> Planet.EARTH.surface_gravity 9.802652743337129 TimePeriod ---------- An example to show the "_ignore_" attribute in use: >>> from datetime import timedelta >>> class Period(timedelta, Enum): ... "different lengths of time" ... _ignore_ = 'Period i' ... Period = vars() ... for i in range(367): ... Period['day_%d' % i] = i ... >>> list(Period)[:2] [, ] >>> list(Period)[-2:] [, ] Subclassing EnumType ==================== While most enum needs can be met by customizing "Enum" subclasses, either with class decorators or custom functions, "EnumType" can be subclassed to provide a different Enum experience. C API Extension Support for Free Threading ****************************************** Starting with the 3.13 release, CPython has experimental support for running with the *global interpreter lock* (GIL) disabled in a configuration called *free threading*. This document describes how to adapt C API extensions to support free threading. Identifying the Free-Threaded Build in C ======================================== The CPython C API exposes the "Py_GIL_DISABLED" macro: in the free- threaded build it’s defined to "1", and in the regular build it’s not defined. You can use it to enable code that only runs under the free- threaded build: #ifdef Py_GIL_DISABLED /* code that only runs in the free-threaded build */ #endif Note: On Windows, this macro is not defined automatically, but must be specified to the compiler when building. The "sysconfig.get_config_var()" function can be used to determine whether the current running interpreter had the macro defined. Module Initialization ===================== Extension modules need to explicitly indicate that they support running with the GIL disabled; otherwise importing the extension will raise a warning and enable the GIL at runtime. There are two ways to indicate that an extension module supports running with the GIL disabled depending on whether the extension uses multi-phase or single-phase initialization. Multi-Phase Initialization -------------------------- Extensions that use multi-phase initialization (i.e., "PyModuleDef_Init()") should add a "Py_mod_gil" slot in the module definition. If your extension supports older versions of CPython, you should guard the slot with a "PY_VERSION_HEX" check. static struct PyModuleDef_Slot module_slots[] = { ... #if PY_VERSION_HEX >= 0x030D0000 {Py_mod_gil, Py_MOD_GIL_NOT_USED}, #endif {0, NULL} }; static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, .m_slots = module_slots, ... }; Single-Phase Initialization --------------------------- Extensions that use single-phase initialization (i.e., "PyModule_Create()") should call "PyUnstable_Module_SetGIL()" to indicate that they support running with the GIL disabled. The function is only defined in the free-threaded build, so you should guard the call with "#ifdef Py_GIL_DISABLED" to avoid compilation errors in the regular build. static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, ... }; PyMODINIT_FUNC PyInit_mymodule(void) { PyObject *m = PyModule_Create(&moduledef); if (m == NULL) { return NULL; } #ifdef Py_GIL_DISABLED PyUnstable_Module_SetGIL(m, Py_MOD_GIL_NOT_USED); #endif return m; } General API Guidelines ====================== Most of the C API is thread-safe, but there are some exceptions. * **Struct Fields**: Accessing fields in Python C API objects or structs directly is not thread-safe if the field may be concurrently modified. * **Macros**: Accessor macros like "PyList_GET_ITEM" and "PyList_SET_ITEM" do not perform any error checking or locking. These macros are not thread-safe if the container object may be modified concurrently. * **Borrowed References**: C API functions that return *borrowed references* may not be thread-safe if the containing object is modified concurrently. See the section on borrowed references for more information. Container Thread Safety ----------------------- Containers like "PyListObject", "PyDictObject", and "PySetObject" perform internal locking in the free-threaded build. For example, the "PyList_Append()" will lock the list before appending an item. "PyDict_Next" ~~~~~~~~~~~~~ A notable exception is "PyDict_Next()", which does not lock the dictionary. You should use "Py_BEGIN_CRITICAL_SECTION" to protect the dictionary while iterating over it if the dictionary may be concurrently modified: Py_BEGIN_CRITICAL_SECTION(dict); PyObject *key, *value; Py_ssize_t pos = 0; while (PyDict_Next(dict, &pos, &key, &value)) { ... } Py_END_CRITICAL_SECTION(); Borrowed References =================== Some C API functions return *borrowed references*. These APIs are not thread-safe if the containing object is modified concurrently. For example, it’s not safe to use "PyList_GetItem()" if the list may be modified concurrently. The following table lists some borrowed reference APIs and their replacements that return *strong references*. +-------------------------------------+-------------------------------------+ | Borrowed reference API | Strong reference API | |=====================================|=====================================| | "PyList_GetItem()" | "PyList_GetItemRef()" | +-------------------------------------+-------------------------------------+ | "PyDict_GetItem()" | "PyDict_GetItemRef()" | +-------------------------------------+-------------------------------------+ | "PyDict_GetItemWithError()" | "PyDict_GetItemRef()" | +-------------------------------------+-------------------------------------+ | "PyDict_GetItemString()" | "PyDict_GetItemStringRef()" | +-------------------------------------+-------------------------------------+ | "PyDict_SetDefault()" | "PyDict_SetDefaultRef()" | +-------------------------------------+-------------------------------------+ | "PyDict_Next()" | none (see PyDict_Next) | +-------------------------------------+-------------------------------------+ | "PyWeakref_GetObject()" | "PyWeakref_GetRef()" | +-------------------------------------+-------------------------------------+ | "PyWeakref_GET_OBJECT()" | "PyWeakref_GetRef()" | +-------------------------------------+-------------------------------------+ | "PyImport_AddModule()" | "PyImport_AddModuleRef()" | +-------------------------------------+-------------------------------------+ Not all APIs that return borrowed references are problematic. For example, "PyTuple_GetItem()" is safe because tuples are immutable. Similarly, not all uses of the above APIs are problematic. For example, "PyDict_GetItem()" is often used for parsing keyword argument dictionaries in function calls; those keyword argument dictionaries are effectively private (not accessible by other threads), so using borrowed references in that context is safe. Some of these functions were added in Python 3.13. You can use the pythoncapi-compat package to provide implementations of these functions for older Python versions. Memory Allocation APIs ====================== Python’s memory management C API provides functions in three different allocation domains: “raw”, “mem”, and “object”. For thread-safety, the free-threaded build requires that only Python objects are allocated using the object domain, and that all Python object are allocated using that domain. This differs from the prior Python versions, where this was only a best practice and not a hard requirement. Note: Search for uses of "PyObject_Malloc()" in your extension and check that the allocated memory is used for Python objects. Use "PyMem_Malloc()" to allocate buffers instead of "PyObject_Malloc()". Thread State and GIL APIs ========================= Python provides a set of functions and macros to manage thread state and the GIL, such as: * "PyGILState_Ensure()" and "PyGILState_Release()" * "PyEval_SaveThread()" and "PyEval_RestoreThread()" * "Py_BEGIN_ALLOW_THREADS" and "Py_END_ALLOW_THREADS" These functions should still be used in the free-threaded build to manage thread state even when the *GIL* is disabled. For example, if you create a thread outside of Python, you must call "PyGILState_Ensure()" before calling into the Python API to ensure that the thread has a valid Python thread state. You should continue to call "PyEval_SaveThread()" or "Py_BEGIN_ALLOW_THREADS" around blocking operations, such as I/O or lock acquisitions, to allow other threads to run the *cyclic garbage collector*. Protecting Internal Extension State =================================== Your extension may have internal state that was previously protected by the GIL. You may need to add locking to protect this state. The approach will depend on your extension, but some common patterns include: * **Caches**: global caches are a common source of shared state. Consider using a lock to protect the cache or disabling it in the free-threaded build if the cache is not critical for performance. * **Global State**: global state may need to be protected by a lock or moved to thread local storage. C11 and C++11 provide the "thread_local" or "_Thread_local" for thread-local storage. Building Extensions for the Free-Threaded Build =============================================== C API extensions need to be built specifically for the free-threaded build. The wheels, shared libraries, and binaries are indicated by a "t" suffix. * pypa/manylinux supports the free-threaded build, with the "t" suffix, such as "python3.13t". * pypa/cibuildwheel supports the free-threaded build if you set CIBW_ENABLE to cpython-freethreading. Limited C API and Stable ABI ---------------------------- The free-threaded build does not currently support the Limited C API or the stable ABI. If you use setuptools to build your extension and currently set "py_limited_api=True" you can use "py_limited_api=not sysconfig.get_config_var("Py_GIL_DISABLED")" to opt out of the limited API when building with the free-threaded build. Note: You will need to build separate wheels specifically for the free- threaded build. If you currently use the stable ABI, you can continue to build a single wheel for multiple non-free-threaded Python versions. Windows ------- Due to a limitation of the official Windows installer, you will need to manually define "Py_GIL_DISABLED=1" when building extensions from source. See also: Porting Extension Modules to Support Free-Threading: A community- maintained porting guide for extension authors. Python experimental support for free threading ********************************************** Starting with the 3.13 release, CPython has experimental support for a build of Python called *free threading* where the *global interpreter lock* (GIL) is disabled. Free-threaded execution allows for full utilization of the available processing power by running threads in parallel on available CPU cores. While not all software will benefit from this automatically, programs designed with threading in mind will run faster on multi-core hardware. **The free-threaded mode is experimental** and work is ongoing to improve it: expect some bugs and a substantial single-threaded performance hit. This document describes the implications of free threading for Python code. See C API Extension Support for Free Threading for information on how to write C extensions that support the free-threaded build. See also: **PEP 703** – Making the Global Interpreter Lock Optional in CPython for an overall description of free-threaded Python. Installation ============ Starting with Python 3.13, the official macOS and Windows installers optionally support installing free-threaded Python binaries. The installers are available at https://www.python.org/downloads/. For information on other platforms, see the Installing a Free-Threaded Python, a community-maintained installation guide for installing free- threaded Python. When building CPython from source, the "--disable-gil" configure option should be used to build a free-threaded Python interpreter. Identifying free-threaded Python ================================ To check if the current interpreter supports free-threading, "python -VV" and "sys.version" contain “experimental free-threading build”. The new "sys._is_gil_enabled()" function can be used to check whether the GIL is actually disabled in the running process. The "sysconfig.get_config_var("Py_GIL_DISABLED")" configuration variable can be used to determine whether the build supports free threading. If the variable is set to "1", then the build supports free threading. This is the recommended mechanism for decisions related to the build configuration. The global interpreter lock in free-threaded Python =================================================== Free-threaded builds of CPython support optionally running with the GIL enabled at runtime using the environment variable "PYTHON_GIL" or the command-line option "-X gil". The GIL may also automatically be enabled when importing a C-API extension module that is not explicitly marked as supporting free threading. A warning will be printed in this case. In addition to individual package documentation, the following websites track the status of popular packages support for free threading: * https://py-free-threading.github.io/tracking/ * https://hugovk.github.io/free-threaded-wheels/ Thread safety ============= The free-threaded build of CPython aims to provide similar thread- safety behavior at the Python level to the default GIL-enabled build. Built-in types like "dict", "list", and "set" use internal locks to protect against concurrent modifications in ways that behave similarly to the GIL. However, Python has not historically guaranteed specific behavior for concurrent modifications to these built-in types, so this should be treated as a description of the current implementation, not a guarantee of current or future behavior. Note: It’s recommended to use the "threading.Lock" or other synchronization primitives instead of relying on the internal locks of built-in types, when possible. Known limitations ================= This section describes known limitations of the free-threaded CPython build. Immortalization --------------- The free-threaded build of the 3.13 release makes some objects *immortal*. Immortal objects are not deallocated and have reference counts that are never modified. This is done to avoid reference count contention that would prevent efficient multi-threaded scaling. An object will be made immortal when a new thread is started for the first time after the main thread is running. The following objects are immortalized: * function objects declared at the module level * method descriptors * code objects * *module* objects and their dictionaries * classes (type objects) Because immortal objects are never deallocated, applications that create many objects of these types may see increased memory usage. This is expected to be addressed in the 3.14 release. Additionally, numeric and string literals in the code as well as strings returned by "sys.intern()" are also immortalized. This behavior is expected to remain in the 3.14 free-threaded build. Frame objects ------------- It is not safe to access frame objects from other threads and doing so may cause your program to crash . This means that "sys._current_frames()" is generally not safe to use in a free- threaded build. Functions like "inspect.currentframe()" and "sys._getframe()" are generally safe as long as the resulting frame object is not passed to another thread. Iterators --------- Sharing the same iterator object between multiple threads is generally not safe and threads may see duplicate or missing elements when iterating or crash the interpreter. Single-threaded performance --------------------------- The free-threaded build has additional overhead when executing Python code compared to the default GIL-enabled build. In 3.13, this overhead is about 40% on the pyperformance suite. Programs that spend most of their time in C extensions or I/O will see less of an impact. The largest impact is because the specializing adaptive interpreter (**PEP 659**) is disabled in the free-threaded build. We expect to re-enable it in a thread-safe way in the 3.14 release. This overhead is expected to be reduced in upcoming Python release. We are aiming for an overhead of 10% or less on the pyperformance suite compared to the default GIL-enabled build. Functional Programming HOWTO **************************** Author: A. M. Kuchling Release: 0.32 In this document, we’ll take a tour of Python’s features suitable for implementing programs in a functional style. After an introduction to the concepts of functional programming, we’ll look at language features such as *iterator*s and *generator*s and relevant library modules such as "itertools" and "functools". Introduction ============ This section explains the basic concept of functional programming; if you’re just interested in learning about Python language features, skip to the next section on Iterators. Programming languages support decomposing problems in several different ways: * Most programming languages are **procedural**: programs are lists of instructions that tell the computer what to do with the program’s input. C, Pascal, and even Unix shells are procedural languages. * In **declarative** languages, you write a specification that describes the problem to be solved, and the language implementation figures out how to perform the computation efficiently. SQL is the declarative language you’re most likely to be familiar with; a SQL query describes the data set you want to retrieve, and the SQL engine decides whether to scan tables or use indexes, which subclauses should be performed first, etc. * **Object-oriented** programs manipulate collections of objects. Objects have internal state and support methods that query or modify this internal state in some way. Smalltalk and Java are object- oriented languages. C++ and Python are languages that support object-oriented programming, but don’t force the use of object- oriented features. * **Functional** programming decomposes a problem into a set of functions. Ideally, functions only take inputs and produce outputs, and don’t have any internal state that affects the output produced for a given input. Well-known functional languages include the ML family (Standard ML, OCaml, and other variants) and Haskell. The designers of some computer languages choose to emphasize one particular approach to programming. This often makes it difficult to write programs that use a different approach. Other languages are multi-paradigm languages that support several different approaches. Lisp, C++, and Python are multi-paradigm; you can write programs or libraries that are largely procedural, object-oriented, or functional in all of these languages. In a large program, different sections might be written using different approaches; the GUI might be object- oriented while the processing logic is procedural or functional, for example. In a functional program, input flows through a set of functions. Each function operates on its input and produces some output. Functional style discourages functions with side effects that modify internal state or make other changes that aren’t visible in the function’s return value. Functions that have no side effects at all are called **purely functional**. Avoiding side effects means not using data structures that get updated as a program runs; every function’s output must only depend on its input. Some languages are very strict about purity and don’t even have assignment statements such as "a=3" or "c = a + b", but it’s difficult to avoid all side effects, such as printing to the screen or writing to a disk file. Another example is a call to the "print()" or "time.sleep()" function, neither of which returns a useful value. Both are called only for their side effects of sending some text to the screen or pausing execution for a second. Python programs written in functional style usually won’t go to the extreme of avoiding all I/O or all assignments; instead, they’ll provide a functional-appearing interface but will use non-functional features internally. For example, the implementation of a function will still use assignments to local variables, but won’t modify global variables or have other side effects. Functional programming can be considered the opposite of object- oriented programming. Objects are little capsules containing some internal state along with a collection of method calls that let you modify this state, and programs consist of making the right set of state changes. Functional programming wants to avoid state changes as much as possible and works with data flowing between functions. In Python you might combine the two approaches by writing functions that take and return instances representing objects in your application (e-mail messages, transactions, etc.). Functional design may seem like an odd constraint to work under. Why should you avoid objects and side effects? There are theoretical and practical advantages to the functional style: * Formal provability. * Modularity. * Composability. * Ease of debugging and testing. Formal provability ------------------ A theoretical benefit is that it’s easier to construct a mathematical proof that a functional program is correct. For a long time researchers have been interested in finding ways to mathematically prove programs correct. This is different from testing a program on numerous inputs and concluding that its output is usually correct, or reading a program’s source code and concluding that the code looks right; the goal is instead a rigorous proof that a program produces the right result for all possible inputs. The technique used to prove programs correct is to write down **invariants**, properties of the input data and of the program’s variables that are always true. For each line of code, you then show that if invariants X and Y are true **before** the line is executed, the slightly different invariants X’ and Y’ are true **after** the line is executed. This continues until you reach the end of the program, at which point the invariants should match the desired conditions on the program’s output. Functional programming’s avoidance of assignments arose because assignments are difficult to handle with this technique; assignments can break invariants that were true before the assignment without producing any new invariants that can be propagated onward. Unfortunately, proving programs correct is largely impractical and not relevant to Python software. Even trivial programs require proofs that are several pages long; the proof of correctness for a moderately complicated program would be enormous, and few or none of the programs you use daily (the Python interpreter, your XML parser, your web browser) could be proven correct. Even if you wrote down or generated a proof, there would then be the question of verifying the proof; maybe there’s an error in it, and you wrongly believe you’ve proved the program correct. Modularity ---------- A more practical benefit of functional programming is that it forces you to break apart your problem into small pieces. Programs are more modular as a result. It’s easier to specify and write a small function that does one thing than a large function that performs a complicated transformation. Small functions are also easier to read and to check for errors. Ease of debugging and testing ----------------------------- Testing and debugging a functional-style program is easier. Debugging is simplified because functions are generally small and clearly specified. When a program doesn’t work, each function is an interface point where you can check that the data are correct. You can look at the intermediate inputs and outputs to quickly isolate the function that’s responsible for a bug. Testing is easier because each function is a potential subject for a unit test. Functions don’t depend on system state that needs to be replicated before running a test; instead you only have to synthesize the right input and then check that the output matches expectations. Composability ------------- As you work on a functional-style program, you’ll write a number of functions with varying inputs and outputs. Some of these functions will be unavoidably specialized to a particular application, but others will be useful in a wide variety of programs. For example, a function that takes a directory path and returns all the XML files in the directory, or a function that takes a filename and returns its contents, can be applied to many different situations. Over time you’ll form a personal library of utilities. Often you’ll assemble new programs by arranging existing functions in a new configuration and writing a few functions specialized for the current task. Iterators ========= I’ll start by looking at a Python language feature that’s an important foundation for writing functional-style programs: iterators. An iterator is an object representing a stream of data; this object returns the data one element at a time. A Python iterator must support a method called "__next__()" that takes no arguments and always returns the next element of the stream. If there are no more elements in the stream, "__next__()" must raise the "StopIteration" exception. Iterators don’t have to be finite, though; it’s perfectly reasonable to write an iterator that produces an infinite stream of data. The built-in "iter()" function takes an arbitrary object and tries to return an iterator that will return the object’s contents or elements, raising "TypeError" if the object doesn’t support iteration. Several of Python’s built-in data types support iteration, the most common being lists and dictionaries. An object is called *iterable* if you can get an iterator for it. You can experiment with the iteration interface manually: >>> L = [1, 2, 3] >>> it = iter(L) >>> it <...iterator object at ...> >>> it.__next__() # same as next(it) 1 >>> next(it) 2 >>> next(it) 3 >>> next(it) Traceback (most recent call last): File "", line 1, in StopIteration >>> Python expects iterable objects in several different contexts, the most important being the "for" statement. In the statement "for X in Y", Y must be an iterator or some object for which "iter()" can create an iterator. These two statements are equivalent: for i in iter(obj): print(i) for i in obj: print(i) Iterators can be materialized as lists or tuples by using the "list()" or "tuple()" constructor functions: >>> L = [1, 2, 3] >>> iterator = iter(L) >>> t = tuple(iterator) >>> t (1, 2, 3) Sequence unpacking also supports iterators: if you know an iterator will return N elements, you can unpack them into an N-tuple: >>> L = [1, 2, 3] >>> iterator = iter(L) >>> a, b, c = iterator >>> a, b, c (1, 2, 3) Built-in functions such as "max()" and "min()" can take a single iterator argument and will return the largest or smallest element. The ""in"" and ""not in"" operators also support iterators: "X in iterator" is true if X is found in the stream returned by the iterator. You’ll run into obvious problems if the iterator is infinite; "max()", "min()" will never return, and if the element X never appears in the stream, the ""in"" and ""not in"" operators won’t return either. Note that you can only go forward in an iterator; there’s no way to get the previous element, reset the iterator, or make a copy of it. Iterator objects can optionally provide these additional capabilities, but the iterator protocol only specifies the "__next__()" method. Functions may therefore consume all of the iterator’s output, and if you need to do something different with the same stream, you’ll have to create a new iterator. Data Types That Support Iterators --------------------------------- We’ve already seen how lists and tuples support iterators. In fact, any Python sequence type, such as strings, will automatically support creation of an iterator. Calling "iter()" on a dictionary returns an iterator that will loop over the dictionary’s keys: >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} >>> for key in m: ... print(key, m[key]) Jan 1 Feb 2 Mar 3 Apr 4 May 5 Jun 6 Jul 7 Aug 8 Sep 9 Oct 10 Nov 11 Dec 12 Note that starting with Python 3.7, dictionary iteration order is guaranteed to be the same as the insertion order. In earlier versions, the behaviour was unspecified and could vary between implementations. Applying "iter()" to a dictionary always loops over the keys, but dictionaries have methods that return other iterators. If you want to iterate over values or key/value pairs, you can explicitly call the "values()" or "items()" methods to get an appropriate iterator. The "dict()" constructor can accept an iterator that returns a finite stream of "(key, value)" tuples: >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')] >>> dict(iter(L)) {'Italy': 'Rome', 'France': 'Paris', 'US': 'Washington DC'} Files also support iteration by calling the "readline()" method until there are no more lines in the file. This means you can read each line of a file like this: for line in file: # do something for each line ... Sets can take their contents from an iterable and let you iterate over the set’s elements: >>> S = {2, 3, 5, 7, 11, 13} >>> for i in S: ... print(i) 2 3 5 7 11 13 Generator expressions and list comprehensions ============================================= Two common operations on an iterator’s output are 1) performing some operation for every element, 2) selecting a subset of elements that meet some condition. For example, given a list of strings, you might want to strip off trailing whitespace from each line or extract all the strings containing a given substring. List comprehensions and generator expressions (short form: “listcomps” and “genexps”) are a concise notation for such operations, borrowed from the functional programming language Haskell (https://www.haskell.org/). You can strip all the whitespace from a stream of strings with the following code: >>> line_list = [' line 1\n', 'line 2 \n', ' \n', ''] >>> # Generator expression -- returns iterator >>> stripped_iter = (line.strip() for line in line_list) >>> # List comprehension -- returns list >>> stripped_list = [line.strip() for line in line_list] You can select only certain elements by adding an ""if"" condition: >>> stripped_list = [line.strip() for line in line_list ... if line != ""] With a list comprehension, you get back a Python list; "stripped_list" is a list containing the resulting lines, not an iterator. Generator expressions return an iterator that computes the values as necessary, not needing to materialize all the values at once. This means that list comprehensions aren’t useful if you’re working with iterators that return an infinite stream or a very large amount of data. Generator expressions are preferable in these situations. Generator expressions are surrounded by parentheses (“()”) and list comprehensions are surrounded by square brackets (“[]”). Generator expressions have the form: ( expression for expr in sequence1 if condition1 for expr2 in sequence2 if condition2 for expr3 in sequence3 ... if condition3 for exprN in sequenceN if conditionN ) Again, for a list comprehension only the outside brackets are different (square brackets instead of parentheses). The elements of the generated output will be the successive values of "expression". The "if" clauses are all optional; if present, "expression" is only evaluated and added to the result when "condition" is true. Generator expressions always have to be written inside parentheses, but the parentheses signalling a function call also count. If you want to create an iterator that will be immediately passed to a function you can write: obj_total = sum(obj.count for obj in list_all_objects()) The "for...in" clauses contain the sequences to be iterated over. The sequences do not have to be the same length, because they are iterated over from left to right, **not** in parallel. For each element in "sequence1", "sequence2" is looped over from the beginning. "sequence3" is then looped over for each resulting pair of elements from "sequence1" and "sequence2". To put it another way, a list comprehension or generator expression is equivalent to the following Python code: for expr1 in sequence1: if not (condition1): continue # Skip this element for expr2 in sequence2: if not (condition2): continue # Skip this element ... for exprN in sequenceN: if not (conditionN): continue # Skip this element # Output the value of # the expression. This means that when there are multiple "for...in" clauses but no "if" clauses, the length of the resulting output will be equal to the product of the lengths of all the sequences. If you have two lists of length 3, the output list is 9 elements long: >>> seq1 = 'abc' >>> seq2 = (1, 2, 3) >>> [(x, y) for x in seq1 for y in seq2] [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1), ('c', 2), ('c', 3)] To avoid introducing an ambiguity into Python’s grammar, if "expression" is creating a tuple, it must be surrounded with parentheses. The first list comprehension below is a syntax error, while the second one is correct: # Syntax error [x, y for x in seq1 for y in seq2] # Correct [(x, y) for x in seq1 for y in seq2] Generators ========== Generators are a special class of functions that simplify the task of writing iterators. Regular functions compute a value and return it, but generators return an iterator that returns a stream of values. You’re doubtless familiar with how regular function calls work in Python or C. When you call a function, it gets a private namespace where its local variables are created. When the function reaches a "return" statement, the local variables are destroyed and the value is returned to the caller. A later call to the same function creates a new private namespace and a fresh set of local variables. But, what if the local variables weren’t thrown away on exiting a function? What if you could later resume the function where it left off? This is what generators provide; they can be thought of as resumable functions. Here’s the simplest example of a generator function: >>> def generate_ints(N): ... for i in range(N): ... yield i Any function containing a "yield" keyword is a generator function; this is detected by Python’s *bytecode* compiler which compiles the function specially as a result. When you call a generator function, it doesn’t return a single value; instead it returns a generator object that supports the iterator protocol. On executing the "yield" expression, the generator outputs the value of "i", similar to a "return" statement. The big difference between "yield" and a "return" statement is that on reaching a "yield" the generator’s state of execution is suspended and local variables are preserved. On the next call to the generator’s "__next__()" method, the function will resume executing. Here’s a sample usage of the "generate_ints()" generator: >>> gen = generate_ints(3) >>> gen >>> next(gen) 0 >>> next(gen) 1 >>> next(gen) 2 >>> next(gen) Traceback (most recent call last): File "stdin", line 1, in File "stdin", line 2, in generate_ints StopIteration You could equally write "for i in generate_ints(5)", or "a, b, c = generate_ints(3)". Inside a generator function, "return value" causes "StopIteration(value)" to be raised from the "__next__()" method. Once this happens, or the bottom of the function is reached, the procession of values ends and the generator cannot yield any further values. You could achieve the effect of generators manually by writing your own class and storing all the local variables of the generator as instance variables. For example, returning a list of integers could be done by setting "self.count" to 0, and having the "__next__()" method increment "self.count" and return it. However, for a moderately complicated generator, writing a corresponding class can be much messier. The test suite included with Python’s library, Lib/test/test_generators.py, contains a number of more interesting examples. Here’s one generator that implements an in-order traversal of a tree using generators recursively. # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x Two other examples in "test_generators.py" produce solutions for the N-Queens problem (placing N queens on an NxN chess board so that no queen threatens another) and the Knight’s Tour (finding a route that takes a knight to every square of an NxN chessboard without visiting any square twice). Passing values into a generator ------------------------------- In Python 2.4 and earlier, generators only produced output. Once a generator’s code was invoked to create an iterator, there was no way to pass any new information into the function when its execution is resumed. You could hack together this ability by making the generator look at a global variable or by passing in some mutable object that callers then modify, but these approaches are messy. In Python 2.5 there’s a simple way to pass values into a generator. "yield" became an expression, returning a value that can be assigned to a variable or otherwise operated on: val = (yield i) I recommend that you **always** put parentheses around a "yield" expression when you’re doing something with the returned value, as in the above example. The parentheses aren’t always necessary, but it’s easier to always add them instead of having to remember when they’re needed. (**PEP 342** explains the exact rules, which are that a "yield"-expression must always be parenthesized except when it occurs at the top-level expression on the right-hand side of an assignment. This means you can write "val = yield i" but have to use parentheses when there’s an operation, as in "val = (yield i) + 12".) Values are sent into a generator by calling its "send(value)" method. This method resumes the generator’s code and the "yield" expression returns the specified value. If the regular "__next__()" method is called, the "yield" returns "None". Here’s a simple counter that increments by 1 and allows changing the value of the internal counter. def counter(maximum): i = 0 while i < maximum: val = (yield i) # If value provided, change counter if val is not None: i = val else: i += 1 And here’s an example of changing the counter: >>> it = counter(10) >>> next(it) 0 >>> next(it) 1 >>> it.send(8) 8 >>> next(it) 9 >>> next(it) Traceback (most recent call last): File "t.py", line 15, in it.next() StopIteration Because "yield" will often be returning "None", you should always check for this case. Don’t just use its value in expressions unless you’re sure that the "send()" method will be the only method used to resume your generator function. In addition to "send()", there are two other methods on generators: * "throw(value)" is used to raise an exception inside the generator; the exception is raised by the "yield" expression where the generator’s execution is paused. * "close()" raises a "GeneratorExit" exception inside the generator to terminate the iteration. On receiving this exception, the generator’s code must either raise "GeneratorExit" or "StopIteration"; catching the exception and doing anything else is illegal and will trigger a "RuntimeError". "close()" will also be called by Python’s garbage collector when the generator is garbage- collected. If you need to run cleanup code when a "GeneratorExit" occurs, I suggest using a "try: ... finally:" suite instead of catching "GeneratorExit". The cumulative effect of these changes is to turn generators from one- way producers of information into both producers and consumers. Generators also become **coroutines**, a more generalized form of subroutines. Subroutines are entered at one point and exited at another point (the top of the function, and a "return" statement), but coroutines can be entered, exited, and resumed at many different points (the "yield" statements). Built-in functions ================== Let’s look in more detail at built-in functions often used with iterators. Two of Python’s built-in functions, "map()" and "filter()" duplicate the features of generator expressions: "map(f, iterA, iterB, ...)" returns an iterator over the sequence "f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...". >>> def upper(s): ... return s.upper() >>> list(map(upper, ['sentence', 'fragment'])) ['SENTENCE', 'FRAGMENT'] >>> [upper(s) for s in ['sentence', 'fragment']] ['SENTENCE', 'FRAGMENT'] You can of course achieve the same effect with a list comprehension. "filter(predicate, iter)" returns an iterator over all the sequence elements that meet a certain condition, and is similarly duplicated by list comprehensions. A **predicate** is a function that returns the truth value of some condition; for use with "filter()", the predicate must take a single value. >>> def is_even(x): ... return (x % 2) == 0 >>> list(filter(is_even, range(10))) [0, 2, 4, 6, 8] This can also be written as a list comprehension: >>> list(x for x in range(10) if is_even(x)) [0, 2, 4, 6, 8] "enumerate(iter, start=0)" counts off the elements in the iterable returning 2-tuples containing the count (from *start*) and each element. >>> for item in enumerate(['subject', 'verb', 'object']): ... print(item) (0, 'subject') (1, 'verb') (2, 'object') "enumerate()" is often used when looping through a list and recording the indexes at which certain conditions are met: f = open('data.txt', 'r') for i, line in enumerate(f): if line.strip() == '': print('Blank line at line #%i' % i) "sorted(iterable, key=None, reverse=False)" collects all the elements of the iterable into a list, sorts the list, and returns the sorted result. The *key* and *reverse* arguments are passed through to the constructed list’s "sort()" method. >>> import random >>> # Generate 8 random numbers between [0, 10000) >>> rand_list = random.sample(range(10000), 8) >>> rand_list [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207] >>> sorted(rand_list) [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878] >>> sorted(rand_list, reverse=True) [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769] (For a more detailed discussion of sorting, see the Sorting Techniques.) The "any(iter)" and "all(iter)" built-ins look at the truth values of an iterable’s contents. "any()" returns "True" if any element in the iterable is a true value, and "all()" returns "True" if all of the elements are true values: >>> any([0, 1, 0]) True >>> any([0, 0, 0]) False >>> any([1, 1, 1]) True >>> all([0, 1, 0]) False >>> all([0, 0, 0]) False >>> all([1, 1, 1]) True "zip(iterA, iterB, ...)" takes one element from each iterable and returns them in a tuple: zip(['a', 'b', 'c'], (1, 2, 3)) => ('a', 1), ('b', 2), ('c', 3) It doesn’t construct an in-memory list and exhaust all the input iterators before returning; instead tuples are constructed and returned only if they’re requested. (The technical term for this behaviour is lazy evaluation.) This iterator is intended to be used with iterables that are all of the same length. If the iterables are of different lengths, the resulting stream will be the same length as the shortest iterable. zip(['a', 'b'], (1, 2, 3)) => ('a', 1), ('b', 2) You should avoid doing this, though, because an element may be taken from the longer iterators and discarded. This means you can’t go on to use the iterators further because you risk skipping a discarded element. The itertools module ==================== The "itertools" module contains a number of commonly used iterators as well as functions for combining several iterators. This section will introduce the module’s contents by showing small examples. The module’s functions fall into a few broad classes: * Functions that create a new iterator based on an existing iterator. * Functions for treating an iterator’s elements as function arguments. * Functions for selecting portions of an iterator’s output. * A function for grouping an iterator’s output. Creating new iterators ---------------------- "itertools.count(start, step)" returns an infinite stream of evenly spaced values. You can optionally supply the starting number, which defaults to 0, and the interval between numbers, which defaults to 1: itertools.count() => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... itertools.count(10) => 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... itertools.count(10, 5) => 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, ... "itertools.cycle(iter)" saves a copy of the contents of a provided iterable and returns a new iterator that returns its elements from first to last. The new iterator will repeat these elements infinitely. itertools.cycle([1, 2, 3, 4, 5]) => 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ... "itertools.repeat(elem, [n])" returns the provided element *n* times, or returns the element endlessly if *n* is not provided. itertools.repeat('abc') => abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ... itertools.repeat('abc', 5) => abc, abc, abc, abc, abc "itertools.chain(iterA, iterB, ...)" takes an arbitrary number of iterables as input, and returns all the elements of the first iterator, then all the elements of the second, and so on, until all of the iterables have been exhausted. itertools.chain(['a', 'b', 'c'], (1, 2, 3)) => a, b, c, 1, 2, 3 "itertools.islice(iter, [start], stop, [step])" returns a stream that’s a slice of the iterator. With a single *stop* argument, it will return the first *stop* elements. If you supply a starting index, you’ll get *stop-start* elements, and if you supply a value for *step*, elements will be skipped accordingly. Unlike Python’s string and list slicing, you can’t use negative values for *start*, *stop*, or *step*. itertools.islice(range(10), 8) => 0, 1, 2, 3, 4, 5, 6, 7 itertools.islice(range(10), 2, 8) => 2, 3, 4, 5, 6, 7 itertools.islice(range(10), 2, 8, 2) => 2, 4, 6 "itertools.tee(iter, [n])" replicates an iterator; it returns *n* independent iterators that will all return the contents of the source iterator. If you don’t supply a value for *n*, the default is 2. Replicating iterators requires saving some of the contents of the source iterator, so this can consume significant memory if the iterator is large and one of the new iterators is consumed more than the others. itertools.tee( itertools.count() ) => iterA, iterB where iterA -> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... and iterB -> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... Calling functions on elements ----------------------------- The "operator" module contains a set of functions corresponding to Python’s operators. Some examples are "operator.add(a, b)" (adds two values), "operator.ne(a, b)" (same as "a != b"), and "operator.attrgetter('id')" (returns a callable that fetches the ".id" attribute). "itertools.starmap(func, iter)" assumes that the iterable will return a stream of tuples, and calls *func* using these tuples as the arguments: itertools.starmap(os.path.join, [('/bin', 'python'), ('/usr', 'bin', 'java'), ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')]) => /bin/python, /usr/bin/java, /usr/bin/perl, /usr/bin/ruby Selecting elements ------------------ Another group of functions chooses a subset of an iterator’s elements based on a predicate. "itertools.filterfalse(predicate, iter)" is the opposite of "filter()", returning all elements for which the predicate returns false: itertools.filterfalse(is_even, itertools.count()) => 1, 3, 5, 7, 9, 11, 13, 15, ... "itertools.takewhile(predicate, iter)" returns elements for as long as the predicate returns true. Once the predicate returns false, the iterator will signal the end of its results. def less_than_10(x): return x < 10 itertools.takewhile(less_than_10, itertools.count()) => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 itertools.takewhile(is_even, itertools.count()) => 0 "itertools.dropwhile(predicate, iter)" discards elements while the predicate returns true, and then returns the rest of the iterable’s results. itertools.dropwhile(less_than_10, itertools.count()) => 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... itertools.dropwhile(is_even, itertools.count()) => 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... "itertools.compress(data, selectors)" takes two iterators and returns only those elements of *data* for which the corresponding element of *selectors* is true, stopping whenever either one is exhausted: itertools.compress([1, 2, 3, 4, 5], [True, True, False, False, True]) => 1, 2, 5 Combinatoric functions ---------------------- The "itertools.combinations(iterable, r)" returns an iterator giving all possible *r*-tuple combinations of the elements contained in *iterable*. itertools.combinations([1, 2, 3, 4, 5], 2) => (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5) itertools.combinations([1, 2, 3, 4, 5], 3) => (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5) The elements within each tuple remain in the same order as *iterable* returned them. For example, the number 1 is always before 2, 3, 4, or 5 in the examples above. A similar function, "itertools.permutations(iterable, r=None)", removes this constraint on the order, returning all possible arrangements of length *r*: itertools.permutations([1, 2, 3, 4, 5], 2) => (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 3), (2, 4), (2, 5), (3, 1), (3, 2), (3, 4), (3, 5), (4, 1), (4, 2), (4, 3), (4, 5), (5, 1), (5, 2), (5, 3), (5, 4) itertools.permutations([1, 2, 3, 4, 5]) => (1, 2, 3, 4, 5), (1, 2, 3, 5, 4), (1, 2, 4, 3, 5), ... (5, 4, 3, 2, 1) If you don’t supply a value for *r* the length of the iterable is used, meaning that all the elements are permuted. Note that these functions produce all of the possible combinations by position and don’t require that the contents of *iterable* are unique: itertools.permutations('aba', 3) => ('a', 'b', 'a'), ('a', 'a', 'b'), ('b', 'a', 'a'), ('b', 'a', 'a'), ('a', 'a', 'b'), ('a', 'b', 'a') The identical tuple "('a', 'a', 'b')" occurs twice, but the two ‘a’ strings came from different positions. The "itertools.combinations_with_replacement(iterable, r)" function relaxes a different constraint: elements can be repeated within a single tuple. Conceptually an element is selected for the first position of each tuple and then is replaced before the second element is selected. itertools.combinations_with_replacement([1, 2, 3, 4, 5], 2) => (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (4, 4), (4, 5), (5, 5) Grouping elements ----------------- The last function I’ll discuss, "itertools.groupby(iter, key_func=None)", is the most complicated. "key_func(elem)" is a function that can compute a key value for each element returned by the iterable. If you don’t supply a key function, the key is simply each element itself. "groupby()" collects all the consecutive elements from the underlying iterable that have the same key value, and returns a stream of 2-tuples containing a key value and an iterator for the elements with that key. city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'), ('Anchorage', 'AK'), ('Nome', 'AK'), ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'), ... ] def get_state(city_state): return city_state[1] itertools.groupby(city_list, get_state) => ('AL', iterator-1), ('AK', iterator-2), ('AZ', iterator-3), ... where iterator-1 => ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL') iterator-2 => ('Anchorage', 'AK'), ('Nome', 'AK') iterator-3 => ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ') "groupby()" assumes that the underlying iterable’s contents will already be sorted based on the key. Note that the returned iterators also use the underlying iterable, so you have to consume the results of iterator-1 before requesting iterator-2 and its corresponding key. The functools module ==================== The "functools" module contains some higher-order functions. A **higher-order function** takes one or more functions as input and returns a new function. The most useful tool in this module is the "functools.partial()" function. For programs written in a functional style, you’ll sometimes want to construct variants of existing functions that have some of the parameters filled in. Consider a Python function "f(a, b, c)"; you may wish to create a new function "g(b, c)" that’s equivalent to "f(1, b, c)"; you’re filling in a value for one of "f()"’s parameters. This is called “partial function application”. The constructor for "partial()" takes the arguments "(function, arg1, arg2, ..., kwarg1=value1, kwarg2=value2)". The resulting object is callable, so you can just call it to invoke "function" with the filled-in arguments. Here’s a small but realistic example: import functools def log(message, subsystem): """Write the contents of 'message' to the specified subsystem.""" print('%s: %s' % (subsystem, message)) ... server_log = functools.partial(log, subsystem='server') server_log('Unable to open socket') "functools.reduce(func, iter, [initial_value])" cumulatively performs an operation on all the iterable’s elements and, therefore, can’t be applied to infinite iterables. *func* must be a function that takes two elements and returns a single value. "functools.reduce()" takes the first two elements A and B returned by the iterator and calculates "func(A, B)". It then requests the third element, C, calculates "func(func(A, B), C)", combines this result with the fourth element returned, and continues until the iterable is exhausted. If the iterable returns no values at all, a "TypeError" exception is raised. If the initial value is supplied, it’s used as a starting point and "func(initial_value, A)" is the first calculation. >>> import operator, functools >>> functools.reduce(operator.concat, ['A', 'BB', 'C']) 'ABBC' >>> functools.reduce(operator.concat, []) Traceback (most recent call last): ... TypeError: reduce() of empty sequence with no initial value >>> functools.reduce(operator.mul, [1, 2, 3], 1) 6 >>> functools.reduce(operator.mul, [], 1) 1 If you use "operator.add()" with "functools.reduce()", you’ll add up all the elements of the iterable. This case is so common that there’s a special built-in called "sum()" to compute it: >>> import functools, operator >>> functools.reduce(operator.add, [1, 2, 3, 4], 0) 10 >>> sum([1, 2, 3, 4]) 10 >>> sum([]) 0 For many uses of "functools.reduce()", though, it can be clearer to just write the obvious "for" loop: import functools # Instead of: product = functools.reduce(operator.mul, [1, 2, 3], 1) # You can write: product = 1 for i in [1, 2, 3]: product *= i A related function is "itertools.accumulate(iterable, func=operator.add)". It performs the same calculation, but instead of returning only the final result, "accumulate()" returns an iterator that also yields each partial result: itertools.accumulate([1, 2, 3, 4, 5]) => 1, 3, 6, 10, 15 itertools.accumulate([1, 2, 3, 4, 5], operator.mul) => 1, 2, 6, 24, 120 The operator module ------------------- The "operator" module was mentioned earlier. It contains a set of functions corresponding to Python’s operators. These functions are often useful in functional-style code because they save you from writing trivial functions that perform a single operation. Some of the functions in this module are: * Math operations: "add()", "sub()", "mul()", "floordiv()", "abs()", … * Logical operations: "not_()", "truth()". * Bitwise operations: "and_()", "or_()", "invert()". * Comparisons: "eq()", "ne()", "lt()", "le()", "gt()", and "ge()". * Object identity: "is_()", "is_not()". Consult the operator module’s documentation for a complete list. Small functions and the lambda expression ========================================= When writing functional-style programs, you’ll often need little functions that act as predicates or that combine elements in some way. If there’s a Python built-in or a module function that’s suitable, you don’t need to define a new function at all: stripped_lines = [line.strip() for line in lines] existing_files = filter(os.path.exists, file_list) If the function you need doesn’t exist, you need to write it. One way to write small functions is to use the "lambda" expression. "lambda" takes a number of parameters and an expression combining these parameters, and creates an anonymous function that returns the value of the expression: adder = lambda x, y: x+y print_assign = lambda name, value: name + '=' + str(value) An alternative is to just use the "def" statement and define a function in the usual way: def adder(x, y): return x + y def print_assign(name, value): return name + '=' + str(value) Which alternative is preferable? That’s a style question; my usual course is to avoid using "lambda". One reason for my preference is that "lambda" is quite limited in the functions it can define. The result has to be computable as a single expression, which means you can’t have multiway "if... elif... else" comparisons or "try... except" statements. If you try to do too much in a "lambda" statement, you’ll end up with an overly complicated expression that’s hard to read. Quick, what’s the following code doing? import functools total = functools.reduce(lambda a, b: (0, a[1] + b[1]), items)[1] You can figure it out, but it takes time to disentangle the expression to figure out what’s going on. Using a short nested "def" statements makes things a little bit better: import functools def combine(a, b): return 0, a[1] + b[1] total = functools.reduce(combine, items)[1] But it would be best of all if I had simply used a "for" loop: total = 0 for a, b in items: total += b Or the "sum()" built-in and a generator expression: total = sum(b for a, b in items) Many uses of "functools.reduce()" are clearer when written as "for" loops. Fredrik Lundh once suggested the following set of rules for refactoring uses of "lambda": 1. Write a lambda function. 2. Write a comment explaining what the heck that lambda does. 3. Study the comment for a while, and think of a name that captures the essence of the comment. 4. Convert the lambda to a def statement, using that name. 5. Remove the comment. I really like these rules, but you’re free to disagree about whether this lambda-free style is better. Revision History and Acknowledgements ===================================== The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Ian Bicking, Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro Lameiro, Jussi Salmela, Collin Winter, Blake Winton. Version 0.1: posted June 30 2006. Version 0.11: posted July 1 2006. Typo fixes. Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one. Typo fixes. Version 0.21: Added more references suggested on the tutor mailing list. Version 0.30: Adds a section on the "functional" module written by Collin Winter; adds short section on the operator module; a few other edits. References ========== General ------- **Structure and Interpretation of Computer Programs**, by Harold Abelson and Gerald Jay Sussman with Julie Sussman. The book can be found at https://mitpress.mit.edu/sicp. In this classic textbook of computer science, chapters 2 and 3 discuss the use of sequences and streams to organize the data flow inside a program. The book uses Scheme for its examples, but many of the design approaches described in these chapters are applicable to functional-style Python code. https://www.defmacro.org/ramblings/fp.html: A general introduction to functional programming that uses Java examples and has a lengthy historical introduction. https://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry describing functional programming. https://en.wikipedia.org/wiki/Coroutine: Entry for coroutines. https://en.wikipedia.org/wiki/Partial_application: Entry for the concept of partial function application. https://en.wikipedia.org/wiki/Currying: Entry for the concept of currying. Python-specific --------------- https://gnosis.cx/TPiP/: The first chapter of David Mertz’s book *Text Processing in Python* discusses functional programming for text processing, in the section titled “Utilizing Higher-Order Functions in Text Processing”. Mertz also wrote a 3-part series of articles on functional programming for IBM’s DeveloperWorks site; see part 1, part 2, and part 3, Python documentation -------------------- Documentation for the "itertools" module. Documentation for the "functools" module. Documentation for the "operator" module. **PEP 289**: “Generator Expressions” **PEP 342**: “Coroutines via Enhanced Generators” describes the new generator features in Python 2.5. Debugging C API extensions and CPython Internals with GDB ********************************************************* This document explains how the Python GDB extension, "python-gdb.py", can be used with the GDB debugger to debug CPython extensions and the CPython interpreter itself. When debugging low-level problems such as crashes or deadlocks, a low- level debugger, such as GDB, is useful to diagnose and correct the issue. By default, GDB (or any of its front-ends) doesn’t support high-level information specific to the CPython interpreter. The "python-gdb.py" extension adds CPython interpreter information to GDB. The extension helps introspect the stack of currently executing Python functions. Given a Python object represented by a PyObject* pointer, the extension surfaces the type and value of the object. Developers who are working on CPython extensions or tinkering with parts of CPython that are written in C can use this document to learn how to use the "python-gdb.py" extension with GDB. Note: This document assumes that you are familiar with the basics of GDB and the CPython C API. It consolidates guidance from the devguide and the Python wiki. Prerequisites ============= You need to have: * GDB 7 or later. (For earlier versions of GDB, see "Misc/gdbinit" in the sources of Python 3.11 or earlier.) * GDB-compatible debugging information for Python and any extension you are debugging. * The "python-gdb.py" extension. The extension is built with Python, but might be distributed separately or not at all. Below, we include tips for a few common systems as examples. Note that even if the instructions match your system, they might be outdated. Setup with Python built from source ----------------------------------- When you build CPython from source, debugging information should be available, and the build should add a "python-gdb.py" file to the root directory of your repository. To activate support, you must add the directory containing "python- gdb.py" to GDB’s “auto-load-safe-path”. If you haven’t done this, recent versions of GDB will print out a warning with instructions on how to do this. Note: If you do not see instructions for your version of GDB, put this in your configuration file ("~/.gdbinit" or "~/.config/gdb/gdbinit"): add-auto-load-safe-path /path/to/cpython You can also add multiple paths, separated by ":". Setup for Python from a Linux distro ------------------------------------ Most Linux systems provide debug information for the system Python in a package called "python-debuginfo", "python-dbg" or similar. For example: * Fedora: sudo dnf install gdb sudo dnf debuginfo-install python3 * Ubuntu: sudo apt install gdb python3-dbg On several recent Linux systems, GDB can download debugging symbols automatically using *debuginfod*. However, this will not install the "python-gdb.py" extension; you generally do need to install the debug info package separately. Using the Debug build and Development mode ========================================== For easier debugging, you might want to: * Use a debug build of Python. (When building from source, use "configure --with-pydebug". On Linux distros, install and run a package like "python-debug" or "python-dbg", if available.) * Use the runtime development mode ("-X dev"). Both enable extra assertions and disable some optimizations. Sometimes this hides the bug you are trying to find, but in most cases they make the process easier. Using the "python-gdb" extension ================================ When the extension is loaded, it provides two main features: pretty printers for Python values, and additional commands. Pretty-printers --------------- This is what a GDB backtrace looks like (truncated) when this extension is enabled: #0 0x000000000041a6b1 in PyObject_Malloc (nbytes=Cannot access memory at address 0x7fffff7fefe8 ) at Objects/obmalloc.c:748 #1 0x000000000041b7c0 in _PyObject_DebugMallocApi (id=111 'o', nbytes=24) at Objects/obmalloc.c:1445 #2 0x000000000041b717 in _PyObject_DebugMalloc (nbytes=24) at Objects/obmalloc.c:1412 #3 0x000000000044060a in _PyUnicode_New (length=11) at Objects/unicodeobject.c:346 #4 0x00000000004466aa in PyUnicodeUCS2_DecodeUTF8Stateful (s=0x5c2b8d "__lltrace__", size=11, errors=0x0, consumed= 0x0) at Objects/unicodeobject.c:2531 #5 0x0000000000446647 in PyUnicodeUCS2_DecodeUTF8 (s=0x5c2b8d "__lltrace__", size=11, errors=0x0) at Objects/unicodeobject.c:2495 #6 0x0000000000440d1b in PyUnicodeUCS2_FromStringAndSize (u=0x5c2b8d "__lltrace__", size=11) at Objects/unicodeobject.c:551 #7 0x0000000000440d94 in PyUnicodeUCS2_FromString (u=0x5c2b8d "__lltrace__") at Objects/unicodeobject.c:569 #8 0x0000000000584abd in PyDict_GetItemString (v= {'Yuck': , '__builtins__': , '__file__': 'Lib/test/crashers/nasty_eq_vs_dict.py', '__package__': None, 'y': , 'dict': {0: 0, 1: 1, 2: 2, 3: 3}, '__cached__': None, '__name__': '__main__', 'z': , '__doc__': None}, key= 0x5c2b8d "__lltrace__") at Objects/dictobject.c:2171 Notice how the dictionary argument to "PyDict_GetItemString" is displayed as its "repr()", rather than an opaque "PyObject *" pointer. The extension works by supplying a custom printing routine for values of type "PyObject *". If you need to access lower-level details of an object, then cast the value to a pointer of the appropriate type. For example: (gdb) p globals $1 = {'__builtins__': , '__name__': '__main__', 'ctypes': , '__doc__': None, '__package__': None} (gdb) p *(PyDictObject*)globals $2 = {ob_refcnt = 3, ob_type = 0x3dbdf85820, ma_fill = 5, ma_used = 5, ma_mask = 7, ma_table = 0x63d0f8, ma_lookup = 0x3dbdc7ea70 , ma_smalltable = {{me_hash = 7065186196740147912, me_key = '__builtins__', me_value = }, {me_hash = -368181376027291943, me_key = '__name__', me_value ='__main__'}, {me_hash = 0, me_key = 0x0, me_value = 0x0}, {me_hash = 0, me_key = 0x0, me_value = 0x0}, {me_hash = -9177857982131165996, me_key = 'ctypes', me_value = }, {me_hash = -8518757509529533123, me_key = '__doc__', me_value = None}, {me_hash = 0, me_key = 0x0, me_value = 0x0}, { me_hash = 6614918939584953775, me_key = '__package__', me_value = None}}} Note that the pretty-printers do not actually call "repr()". For basic types, they try to match its result closely. An area that can be confusing is that the custom printer for some types look a lot like GDB’s built-in printer for standard types. For example, the pretty-printer for a Python "int" (PyLongObject*) gives a representation that is not distinguishable from one of a regular machine-level integer: (gdb) p some_machine_integer $3 = 42 (gdb) p some_python_integer $4 = 42 The internal structure can be revealed with a cast to PyLongObject*: (gdb) p *(PyLongObject*)some_python_integer $5 = {ob_base = {ob_base = {ob_refcnt = 8, ob_type = 0x3dad39f5e0}, ob_size = 1}, ob_digit = {42}} A similar confusion can arise with the "str" type, where the output looks a lot like gdb’s built-in printer for "char *": (gdb) p ptr_to_python_str $6 = '__builtins__' The pretty-printer for "str" instances defaults to using single-quotes (as does Python’s "repr" for strings) whereas the standard printer for "char *" values uses double-quotes and contains a hexadecimal address: (gdb) p ptr_to_char_star $7 = 0x6d72c0 "hello world" Again, the implementation details can be revealed with a cast to PyUnicodeObject*: (gdb) p *(PyUnicodeObject*)$6 $8 = {ob_base = {ob_refcnt = 33, ob_type = 0x3dad3a95a0}, length = 12, str = 0x7ffff2128500, hash = 7065186196740147912, state = 1, defenc = 0x0} "py-list" --------- The extension adds a "py-list" command, which lists the Python source code (if any) for the current frame in the selected thread. The current line is marked with a “>”: (gdb) py-list 901 if options.profile: 902 options.profile = False 903 profile_me() 904 return 905 >906 u = UI() 907 if not u.quit: 908 try: 909 gtk.main() 910 except KeyboardInterrupt: 911 # properly quit on a keyboard interrupt... Use "py-list START" to list at a different line number within the Python source, and "py-list START,END" to list a specific range of lines within the Python source. "py-up" and "py-down" --------------------- The "py-up" and "py-down" commands are analogous to GDB’s regular "up" and "down" commands, but try to move at the level of CPython frames, rather than C frames. GDB is not always able to read the relevant frame information, depending on the optimization level with which CPython was compiled. Internally, the commands look for C frames that are executing the default frame evaluation function (that is, the core bytecode interpreter loop within CPython) and look up the value of the related "PyFrameObject *". They emit the frame number (at the C level) within the thread. For example: (gdb) py-up #37 Frame 0x9420b04, for file /usr/lib/python2.6/site-packages/ gnome_sudoku/main.py, line 906, in start_game () u = UI() (gdb) py-up #40 Frame 0x948e82c, for file /usr/lib/python2.6/site-packages/ gnome_sudoku/gnome_sudoku.py, line 22, in start_game(main=) main.start_game() (gdb) py-up Unable to find an older python frame so we’re at the top of the Python stack. The frame numbers correspond to those displayed by GDB’s standard "backtrace" command. The command skips C frames which are not executing Python code. Going back down: (gdb) py-down #37 Frame 0x9420b04, for file /usr/lib/python2.6/site-packages/gnome_sudoku/main.py, line 906, in start_game () u = UI() (gdb) py-down #34 (unable to read python frame information) (gdb) py-down #23 (unable to read python frame information) (gdb) py-down #19 (unable to read python frame information) (gdb) py-down #14 Frame 0x99262ac, for file /usr/lib/python2.6/site-packages/gnome_sudoku/game_selector.py, line 201, in run_swallowed_dialog (self=, puzzle=None, saved_games=[{'gsd.auto_fills': 0, 'tracking': {}, 'trackers': {}, 'notes': [], 'saved_at': 1270084485, 'game': '7 8 0 0 0 0 0 5 6 0 0 9 0 8 0 1 0 0 0 4 6 0 0 0 0 7 0 6 5 0 0 0 4 7 9 2 0 0 0 9 0 1 0 0 0 3 9 7 6 0 0 0 1 8 0 6 0 0 0 0 2 8 0 0 0 5 0 4 0 6 0 0 2 1 0 0 0 0 0 4 5\n7 8 0 0 0 0 0 5 6 0 0 9 0 8 0 1 0 0 0 4 6 0 0 0 0 7 0 6 5 1 8 3 4 7 9 2 0 0 0 9 0 1 0 0 0 3 9 7 6 0 0 0 1 8 0 6 0 0 0 0 2 8 0 0 0 5 0 4 0 6 0 0 2 1 0 0 0 0 0 4 5', 'gsd.impossible_hints': 0, 'timer.__absolute_start_time__': , 'gsd.hints': 0, 'timer.active_time': , 'timer.total_time': }], dialog=, saved_game_model=, sudoku_maker=, main_page=0) at remote 0x98fa6e4>, d=) gtk.main() (gdb) py-down #8 (unable to read python frame information) (gdb) py-down Unable to find a newer python frame and we’re at the bottom of the Python stack. Note that in Python 3.12 and newer, the same C stack frame can be used for multiple Python stack frames. This means that "py-up" and "py-down" may move multiple Python frames at once. For example: (gdb) py-up #6 Frame 0x7ffff7fb62b0, for file /tmp/rec.py, line 5, in recursive_function (n=0) time.sleep(5) #6 Frame 0x7ffff7fb6240, for file /tmp/rec.py, line 7, in recursive_function (n=1) recursive_function(n-1) #6 Frame 0x7ffff7fb61d0, for file /tmp/rec.py, line 7, in recursive_function (n=2) recursive_function(n-1) #6 Frame 0x7ffff7fb6160, for file /tmp/rec.py, line 7, in recursive_function (n=3) recursive_function(n-1) #6 Frame 0x7ffff7fb60f0, for file /tmp/rec.py, line 7, in recursive_function (n=4) recursive_function(n-1) #6 Frame 0x7ffff7fb6080, for file /tmp/rec.py, line 7, in recursive_function (n=5) recursive_function(n-1) #6 Frame 0x7ffff7fb6020, for file /tmp/rec.py, line 9, in () recursive_function(5) (gdb) py-up Unable to find an older python frame "py-bt" ------- The "py-bt" command attempts to display a Python-level backtrace of the current thread. For example: (gdb) py-bt #8 (unable to read python frame information) #11 Frame 0x9aead74, for file /usr/lib/python2.6/site-packages/gnome_sudoku/dialog_swallower.py, line 48, in run_dialog (self=, main_page=0) at remote 0x98fa6e4>, d=) gtk.main() #14 Frame 0x99262ac, for file /usr/lib/python2.6/site-packages/gnome_sudoku/game_selector.py, line 201, in run_swallowed_dialog (self=, puzzle=None, saved_games=[{'gsd.auto_fills': 0, 'tracking': {}, 'trackers': {}, 'notes': [], 'saved_at': 1270084485, 'game': '7 8 0 0 0 0 0 5 6 0 0 9 0 8 0 1 0 0 0 4 6 0 0 0 0 7 0 6 5 0 0 0 4 7 9 2 0 0 0 9 0 1 0 0 0 3 9 7 6 0 0 0 1 8 0 6 0 0 0 0 2 8 0 0 0 5 0 4 0 6 0 0 2 1 0 0 0 0 0 4 5\n7 8 0 0 0 0 0 5 6 0 0 9 0 8 0 1 0 0 0 4 6 0 0 0 0 7 0 6 5 1 8 3 4 7 9 2 0 0 0 9 0 1 0 0 0 3 9 7 6 0 0 0 1 8 0 6 0 0 0 0 2 8 0 0 0 5 0 4 0 6 0 0 2 1 0 0 0 0 0 4 5', 'gsd.impossible_hints': 0, 'timer.__absolute_start_time__': , 'gsd.hints': 0, 'timer.active_time': , 'timer.total_time': }], dialog=, saved_game_model=, sudoku_maker=) main.start_game() The frame numbers correspond to those displayed by GDB’s standard "backtrace" command. "py-print" ---------- The "py-print" command looks up a Python name and tries to print it. It looks in locals within the current thread, then globals, then finally builtins: (gdb) py-print self local 'self' = , main_page=0) at remote 0x98fa6e4> (gdb) py-print __name__ global '__name__' = 'gnome_sudoku.dialog_swallower' (gdb) py-print len builtin 'len' = (gdb) py-print scarlet_pimpernel 'scarlet_pimpernel' not found If the current C frame corresponds to multiple Python frames, "py- print" only considers the first one. "py-locals" ----------- The "py-locals" command looks up all Python locals within the current Python frame in the selected thread, and prints their representations: (gdb) py-locals self = , main_page=0) at remote 0x98fa6e4> d = If the current C frame corresponds to multiple Python frames, locals from all of them will be shown: (gdb) py-locals Locals for recursive_function n = 0 Locals for recursive_function n = 1 Locals for recursive_function n = 2 Locals for recursive_function n = 3 Locals for recursive_function n = 4 Locals for recursive_function n = 5 Locals for Use with GDB commands ===================== The extension commands complement GDB’s built-in commands. For example, you can use a frame numbers shown by "py-bt" with the "frame" command to go a specific frame within the selected thread, like this: (gdb) py-bt (output snipped) #68 Frame 0xaa4560, for file Lib/test/regrtest.py, line 1548, in () main() (gdb) frame 68 #68 0x00000000004cd1e6 in PyEval_EvalFrameEx (f=Frame 0xaa4560, for file Lib/test/regrtest.py, line 1548, in (), throwflag=0) at Python/ceval.c:2665 2665 x = call_function(&sp, oparg); (gdb) py-list 1543 # Run the tests in a context manager that temporary changes the CWD to a 1544 # temporary and writable directory. If it's not possible to create or 1545 # change the CWD, the original CWD will be used. The original CWD is 1546 # available from test_support.SAVEDCWD. 1547 with test_support.temp_cwd(TESTCWD, quiet=True): >1548 main() The "info threads" command will give you a list of the threads within the process, and you can use the "thread" command to select a different one: (gdb) info threads 105 Thread 0x7fffefa18710 (LWP 10260) sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86 104 Thread 0x7fffdf5fe710 (LWP 10259) sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86 * 1 Thread 0x7ffff7fe2700 (LWP 10145) 0x00000038e46d73e3 in select () at ../sysdeps/unix/syscall-template.S:82 You can use "thread apply all COMMAND" or ("t a a COMMAND" for short) to run a command on all threads. With "py-bt", this lets you see what every thread is doing at the Python level: (gdb) t a a py-bt Thread 105 (Thread 0x7fffefa18710 (LWP 10260)): #5 Frame 0x7fffd00019d0, for file /home/david/coding/python-svn/Lib/threading.py, line 155, in _acquire_restore (self=<_RLock(_Verbose__verbose=False, _RLock__owner=140737354016512, _RLock__block=, _RLock__count=1) at remote 0xd7ff40>, count_owner=(1, 140737213728528), count=1, owner=140737213728528) self.__block.acquire() #8 Frame 0x7fffac001640, for file /home/david/coding/python-svn/Lib/threading.py, line 269, in wait (self=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=140737354016512, _RLock__block=, _RLock__count=1) at remote 0xd7ff40>, acquire=, _is_owned=, _release_save=, release=, _acquire_restore=, _Verbose__verbose=False, _Condition__waiters=[]) at remote 0xd7fd10>, timeout=None, waiter=, saved_state=(1, 140737213728528)) self._acquire_restore(saved_state) #12 Frame 0x7fffb8001a10, for file /home/david/coding/python-svn/Lib/test/lock_tests.py, line 348, in f () cond.wait() #16 Frame 0x7fffb8001c40, for file /home/david/coding/python-svn/Lib/test/lock_tests.py, line 37, in task (tid=140737213728528) f() Thread 104 (Thread 0x7fffdf5fe710 (LWP 10259)): #5 Frame 0x7fffe4001580, for file /home/david/coding/python-svn/Lib/threading.py, line 155, in _acquire_restore (self=<_RLock(_Verbose__verbose=False, _RLock__owner=140737354016512, _RLock__block=, _RLock__count=1) at remote 0xd7ff40>, count_owner=(1, 140736940992272), count=1, owner=140736940992272) self.__block.acquire() #8 Frame 0x7fffc8002090, for file /home/david/coding/python-svn/Lib/threading.py, line 269, in wait (self=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=140737354016512, _RLock__block=, _RLock__count=1) at remote 0xd7ff40>, acquire=, _is_owned=, _release_save=, release=, _acquire_restore=, _Verbose__verbose=False, _Condition__waiters=[]) at remote 0xd7fd10>, timeout=None, waiter=, saved_state=(1, 140736940992272)) self._acquire_restore(saved_state) #12 Frame 0x7fffac001c90, for file /home/david/coding/python-svn/Lib/test/lock_tests.py, line 348, in f () cond.wait() #16 Frame 0x7fffac0011c0, for file /home/david/coding/python-svn/Lib/test/lock_tests.py, line 37, in task (tid=140736940992272) f() Thread 1 (Thread 0x7ffff7fe2700 (LWP 10145)): #5 Frame 0xcb5380, for file /home/david/coding/python-svn/Lib/test/lock_tests.py, line 16, in _wait () time.sleep(0.01) #8 Frame 0x7fffd00024a0, for file /home/david/coding/python-svn/Lib/test/lock_tests.py, line 378, in _check_notify (self=, skipped=[], _mirrorOutput=False, testsRun=39, buffer=False, _original_stderr=, _stdout_buffer=, _stderr_buffer=, _moduleSetUpFailed=False, expectedFailures=[], errors=[], _previousTestClass=, unexpectedSuccesses=[], failures=[], shouldStop=False, failfast=False) at remote 0xc185a0>, _threads=(0,), _cleanups=[], _type_equality_funcs={: , : , : , : , trace = 1; } python$target:::function-entry /self->trace/ { printf("%d\t%*s:", timestamp, 15, probename); printf("%*s", self->indent, ""); printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2); self->indent++; } python$target:::function-return /self->trace/ { self->indent--; printf("%d\t%*s:", timestamp, 15, probename); printf("%*s", self->indent, ""); printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2); } python$target:::function-return /copyinstr(arg1) == "start"/ { self->trace = 0; } It can be invoked like this: $ sudo dtrace -q -s call_stack.d -c "python3.6 script.py" The output looks like this: 156641360502280 function-entry:call_stack.py:start:23 156641360518804 function-entry: call_stack.py:function_1:1 156641360532797 function-entry: call_stack.py:function_3:9 156641360546807 function-return: call_stack.py:function_3:10 156641360563367 function-return: call_stack.py:function_1:2 156641360578365 function-entry: call_stack.py:function_2:5 156641360591757 function-entry: call_stack.py:function_1:1 156641360605556 function-entry: call_stack.py:function_3:9 156641360617482 function-return: call_stack.py:function_3:10 156641360629814 function-return: call_stack.py:function_1:2 156641360642285 function-return: call_stack.py:function_2:6 156641360656770 function-entry: call_stack.py:function_3:9 156641360669707 function-return: call_stack.py:function_3:10 156641360687853 function-entry: call_stack.py:function_4:13 156641360700719 function-return: call_stack.py:function_4:14 156641360719640 function-entry: call_stack.py:function_5:18 156641360732567 function-return: call_stack.py:function_5:21 156641360747370 function-return:call_stack.py:start:28 Static SystemTap markers ======================== The low-level way to use the SystemTap integration is to use the static markers directly. This requires you to explicitly state the binary file containing them. For example, this SystemTap script can be used to show the call/return hierarchy of a Python script: probe process("python").mark("function__entry") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; printf("%s => %s in %s:%d\\n", thread_indent(1), funcname, filename, lineno); } probe process("python").mark("function__return") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; printf("%s <= %s in %s:%d\\n", thread_indent(-1), funcname, filename, lineno); } It can be invoked like this: $ stap \ show-call-hierarchy.stp \ -c "./python test.py" The output looks like this: 11408 python(8274): => __contains__ in Lib/_abcoll.py:362 11414 python(8274): => __getitem__ in Lib/os.py:425 11418 python(8274): => encode in Lib/os.py:490 11424 python(8274): <= encode in Lib/os.py:493 11428 python(8274): <= __getitem__ in Lib/os.py:426 11433 python(8274): <= __contains__ in Lib/_abcoll.py:366 where the columns are: * time in microseconds since start of script * name of executable * PID of process and the remainder indicates the call/return hierarchy as the script executes. For a "--enable-shared" build of CPython, the markers are contained within the libpython shared library, and the probe’s dotted path needs to reflect this. For example, this line from the above example: probe process("python").mark("function__entry") { should instead read: probe process("python").library("libpython3.6dm.so.1.0").mark("function__entry") { (assuming a debug build of CPython 3.6) Available static markers ======================== function__entry(str filename, str funcname, int lineno) This marker indicates that execution of a Python function has begun. It is only triggered for pure-Python (bytecode) functions. The filename, function name, and line number are provided back to the tracing script as positional arguments, which must be accessed using "$arg1", "$arg2", "$arg3": * "$arg1" : "(const char *)" filename, accessible using "user_string($arg1)" * "$arg2" : "(const char *)" function name, accessible using "user_string($arg2)" * "$arg3" : "int" line number function__return(str filename, str funcname, int lineno) This marker is the converse of "function__entry()", and indicates that execution of a Python function has ended (either via "return", or via an exception). It is only triggered for pure-Python (bytecode) functions. The arguments are the same as for "function__entry()" line(str filename, str funcname, int lineno) This marker indicates a Python line is about to be executed. It is the equivalent of line-by-line tracing with a Python profiler. It is not triggered within C functions. The arguments are the same as for "function__entry()". gc__start(int generation) Fires when the Python interpreter starts a garbage collection cycle. "arg0" is the generation to scan, like "gc.collect()". gc__done(long collected) Fires when the Python interpreter finishes a garbage collection cycle. "arg0" is the number of collected objects. import__find__load__start(str modulename) Fires before "importlib" attempts to find and load the module. "arg0" is the module name. Added in version 3.7. import__find__load__done(str modulename, int found) Fires after "importlib"’s find_and_load function is called. "arg0" is the module name, "arg1" indicates if module was successfully loaded. Added in version 3.7. audit(str event, void *tuple) Fires when "sys.audit()" or "PySys_Audit()" is called. "arg0" is the event name as C string, "arg1" is a "PyObject" pointer to a tuple object. Added in version 3.8. SystemTap Tapsets ================= The higher-level way to use the SystemTap integration is to use a “tapset”: SystemTap’s equivalent of a library, which hides some of the lower-level details of the static markers. Here is a tapset file, based on a non-shared build of CPython: /* Provide a higher-level wrapping around the function__entry and function__return markers: \*/ probe python.function.entry = process("python").mark("function__entry") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; frameptr = $arg4 } probe python.function.return = process("python").mark("function__return") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; frameptr = $arg4 } If this file is installed in SystemTap’s tapset directory (e.g. "/usr/share/systemtap/tapset"), then these additional probepoints become available: python.function.entry(str filename, str funcname, int lineno, frameptr) This probe point indicates that execution of a Python function has begun. It is only triggered for pure-Python (bytecode) functions. python.function.return(str filename, str funcname, int lineno, frameptr) This probe point is the converse of "python.function.return", and indicates that execution of a Python function has ended (either via "return", or via an exception). It is only triggered for pure- Python (bytecode) functions. Examples ======== This SystemTap script uses the tapset above to more cleanly implement the example given above of tracing the Python function-call hierarchy, without needing to directly name the static markers: probe python.function.entry { printf("%s => %s in %s:%d\n", thread_indent(1), funcname, filename, lineno); } probe python.function.return { printf("%s <= %s in %s:%d\n", thread_indent(-1), funcname, filename, lineno); } The following script uses the tapset above to provide a top-like view of all running CPython code, showing the top 20 most frequently entered bytecode frames, each second, across the whole system: global fn_calls; probe python.function.entry { fn_calls[pid(), filename, funcname, lineno] += 1; } probe timer.ms(1000) { printf("\033[2J\033[1;1H") /* clear screen \*/ printf("%6s %80s %6s %30s %6s\n", "PID", "FILENAME", "LINE", "FUNCTION", "CALLS") foreach ([pid, filename, funcname, lineno] in fn_calls- limit 20) { printf("%6d %80s %6d %30s %6d\n", pid, filename, lineno, funcname, fn_calls[pid, filename, funcname, lineno]); } delete fn_calls; } An introduction to the ipaddress module *************************************** author: Peter Moody author: Nick Coghlan Overview ^^^^^^^^ This document aims to provide a gentle introduction to the "ipaddress" module. It is aimed primarily at users that aren’t already familiar with IP networking terminology, but may also be useful to network engineers wanting an overview of how "ipaddress" represents IP network addressing concepts. Creating Address/Network/Interface objects ========================================== Since "ipaddress" is a module for inspecting and manipulating IP addresses, the first thing you’ll want to do is create some objects. You can use "ipaddress" to create objects from strings and integers. A Note on IP Versions --------------------- For readers that aren’t particularly familiar with IP addressing, it’s important to know that the Internet Protocol (IP) is currently in the process of moving from version 4 of the protocol to version 6. This transition is occurring largely because version 4 of the protocol doesn’t provide enough addresses to handle the needs of the whole world, especially given the increasing number of devices with direct connections to the internet. Explaining the details of the differences between the two versions of the protocol is beyond the scope of this introduction, but readers need to at least be aware that these two versions exist, and it will sometimes be necessary to force the use of one version or the other. IP Host Addresses ----------------- Addresses, often referred to as “host addresses” are the most basic unit when working with IP addressing. The simplest way to create addresses is to use the "ipaddress.ip_address()" factory function, which automatically determines whether to create an IPv4 or IPv6 address based on the passed in value: >>> ipaddress.ip_address('192.0.2.1') IPv4Address('192.0.2.1') >>> ipaddress.ip_address('2001:DB8::1') IPv6Address('2001:db8::1') Addresses can also be created directly from integers. Values that will fit within 32 bits are assumed to be IPv4 addresses: >>> ipaddress.ip_address(3221225985) IPv4Address('192.0.2.1') >>> ipaddress.ip_address(42540766411282592856903984951653826561) IPv6Address('2001:db8::1') To force the use of IPv4 or IPv6 addresses, the relevant classes can be invoked directly. This is particularly useful to force creation of IPv6 addresses for small integers: >>> ipaddress.ip_address(1) IPv4Address('0.0.0.1') >>> ipaddress.IPv4Address(1) IPv4Address('0.0.0.1') >>> ipaddress.IPv6Address(1) IPv6Address('::1') Defining Networks ----------------- Host addresses are usually grouped together into IP networks, so "ipaddress" provides a way to create, inspect and manipulate network definitions. IP network objects are constructed from strings that define the range of host addresses that are part of that network. The simplest form for that information is a “network address/network prefix” pair, where the prefix defines the number of leading bits that are compared to determine whether or not an address is part of the network and the network address defines the expected value of those bits. As for addresses, a factory function is provided that determines the correct IP version automatically: >>> ipaddress.ip_network('192.0.2.0/24') IPv4Network('192.0.2.0/24') >>> ipaddress.ip_network('2001:db8::0/96') IPv6Network('2001:db8::/96') Network objects cannot have any host bits set. The practical effect of this is that "192.0.2.1/24" does not describe a network. Such definitions are referred to as interface objects since the ip- on-a-network notation is commonly used to describe network interfaces of a computer on a given network and are described further in the next section. By default, attempting to create a network object with host bits set will result in "ValueError" being raised. To request that the additional bits instead be coerced to zero, the flag "strict=False" can be passed to the constructor: >>> ipaddress.ip_network('192.0.2.1/24') Traceback (most recent call last): ... ValueError: 192.0.2.1/24 has host bits set >>> ipaddress.ip_network('192.0.2.1/24', strict=False) IPv4Network('192.0.2.0/24') While the string form offers significantly more flexibility, networks can also be defined with integers, just like host addresses. In this case, the network is considered to contain only the single address identified by the integer, so the network prefix includes the entire network address: >>> ipaddress.ip_network(3221225984) IPv4Network('192.0.2.0/32') >>> ipaddress.ip_network(42540766411282592856903984951653826560) IPv6Network('2001:db8::/128') As with addresses, creation of a particular kind of network can be forced by calling the class constructor directly instead of using the factory function. Host Interfaces --------------- As mentioned just above, if you need to describe an address on a particular network, neither the address nor the network classes are sufficient. Notation like "192.0.2.1/24" is commonly used by network engineers and the people who write tools for firewalls and routers as shorthand for “the host "192.0.2.1" on the network "192.0.2.0/24"”, Accordingly, "ipaddress" provides a set of hybrid classes that associate an address with a particular network. The interface for creation is identical to that for defining network objects, except that the address portion isn’t constrained to being a network address. >>> ipaddress.ip_interface('192.0.2.1/24') IPv4Interface('192.0.2.1/24') >>> ipaddress.ip_interface('2001:db8::1/96') IPv6Interface('2001:db8::1/96') Integer inputs are accepted (as with networks), and use of a particular IP version can be forced by calling the relevant constructor directly. Inspecting Address/Network/Interface Objects ============================================ You’ve gone to the trouble of creating an IPv(4|6)(Address|Network|Interface) object, so you probably want to get information about it. "ipaddress" tries to make doing this easy and intuitive. Extracting the IP version: >>> addr4 = ipaddress.ip_address('192.0.2.1') >>> addr6 = ipaddress.ip_address('2001:db8::1') >>> addr6.version 6 >>> addr4.version 4 Obtaining the network from an interface: >>> host4 = ipaddress.ip_interface('192.0.2.1/24') >>> host4.network IPv4Network('192.0.2.0/24') >>> host6 = ipaddress.ip_interface('2001:db8::1/96') >>> host6.network IPv6Network('2001:db8::/96') Finding out how many individual addresses are in a network: >>> net4 = ipaddress.ip_network('192.0.2.0/24') >>> net4.num_addresses 256 >>> net6 = ipaddress.ip_network('2001:db8::0/96') >>> net6.num_addresses 4294967296 Iterating through the “usable” addresses on a network: >>> net4 = ipaddress.ip_network('192.0.2.0/24') >>> for x in net4.hosts(): ... print(x) 192.0.2.1 192.0.2.2 192.0.2.3 192.0.2.4 ... 192.0.2.252 192.0.2.253 192.0.2.254 Obtaining the netmask (i.e. set bits corresponding to the network prefix) or the hostmask (any bits that are not part of the netmask): >>> net4 = ipaddress.ip_network('192.0.2.0/24') >>> net4.netmask IPv4Address('255.255.255.0') >>> net4.hostmask IPv4Address('0.0.0.255') >>> net6 = ipaddress.ip_network('2001:db8::0/96') >>> net6.netmask IPv6Address('ffff:ffff:ffff:ffff:ffff:ffff::') >>> net6.hostmask IPv6Address('::ffff:ffff') Exploding or compressing the address: >>> addr6.exploded '2001:0db8:0000:0000:0000:0000:0000:0001' >>> addr6.compressed '2001:db8::1' >>> net6.exploded '2001:0db8:0000:0000:0000:0000:0000:0000/96' >>> net6.compressed '2001:db8::/96' While IPv4 doesn’t support explosion or compression, the associated objects still provide the relevant properties so that version neutral code can easily ensure the most concise or most verbose form is used for IPv6 addresses while still correctly handling IPv4 addresses. Networks as lists of Addresses ============================== It’s sometimes useful to treat networks as lists. This means it is possible to index them like this: >>> net4[1] IPv4Address('192.0.2.1') >>> net4[-1] IPv4Address('192.0.2.255') >>> net6[1] IPv6Address('2001:db8::1') >>> net6[-1] IPv6Address('2001:db8::ffff:ffff') It also means that network objects lend themselves to using the list membership test syntax like this: if address in network: # do something Containment testing is done efficiently based on the network prefix: >>> addr4 = ipaddress.ip_address('192.0.2.1') >>> addr4 in ipaddress.ip_network('192.0.2.0/24') True >>> addr4 in ipaddress.ip_network('192.0.3.0/24') False Comparisons =========== "ipaddress" provides some simple, hopefully intuitive ways to compare objects, where it makes sense: >>> ipaddress.ip_address('192.0.2.1') < ipaddress.ip_address('192.0.2.2') True A "TypeError" exception is raised if you try to compare objects of different versions or different types. Using IP Addresses with other modules ===================================== Other modules that use IP addresses (such as "socket") usually won’t accept objects from this module directly. Instead, they must be coerced to an integer or string that the other module will accept: >>> addr4 = ipaddress.ip_address('192.0.2.1') >>> str(addr4) '192.0.2.1' >>> int(addr4) 3221225985 Getting more detail when instance creation fails ================================================ When creating address/network/interface objects using the version- agnostic factory functions, any errors will be reported as "ValueError" with a generic error message that simply says the passed in value was not recognized as an object of that type. The lack of a specific error is because it’s necessary to know whether the value is *supposed* to be IPv4 or IPv6 in order to provide more detail on why it has been rejected. To support use cases where it is useful to have access to this additional detail, the individual class constructors actually raise the "ValueError" subclasses "ipaddress.AddressValueError" and "ipaddress.NetmaskValueError" to indicate exactly which part of the definition failed to parse correctly. The error messages are significantly more detailed when using the class constructors directly. For example: >>> ipaddress.ip_address("192.168.0.256") Traceback (most recent call last): ... ValueError: '192.168.0.256' does not appear to be an IPv4 or IPv6 address >>> ipaddress.IPv4Address("192.168.0.256") Traceback (most recent call last): ... ipaddress.AddressValueError: Octet 256 (> 255) not permitted in '192.168.0.256' >>> ipaddress.ip_network("192.168.0.1/64") Traceback (most recent call last): ... ValueError: '192.168.0.1/64' does not appear to be an IPv4 or IPv6 network >>> ipaddress.IPv4Network("192.168.0.1/64") Traceback (most recent call last): ... ipaddress.NetmaskValueError: '64' is not a valid netmask However, both of the module specific exceptions have "ValueError" as their parent class, so if you’re not concerned with the particular type of error, you can still write code like the following: try: network = ipaddress.IPv4Network(address) except ValueError: print('address/netmask is invalid for IPv4:', address) Isolating Extension Modules *************************** Abstract ^^^^^^^^ Traditionally, state belonging to Python extension modules was kept in C "static" variables, which have process-wide scope. This document describes problems of such per-process state and shows a safer way: per-module state. The document also describes how to switch to per-module state where possible. This transition involves allocating space for that state, potentially switching from static types to heap types, and—perhaps most importantly—accessing per-module state from code. Who should read this ==================== This guide is written for maintainers of C-API extensions who would like to make that extension safer to use in applications where Python itself is used as a library. Background ========== An *interpreter* is the context in which Python code runs. It contains configuration (e.g. the import path) and runtime state (e.g. the set of imported modules). Python supports running multiple interpreters in one process. There are two cases to think about—users may run interpreters: * in sequence, with several "Py_InitializeEx()"/"Py_FinalizeEx()" cycles, and * in parallel, managing “sub-interpreters” using "Py_NewInterpreter()"/"Py_EndInterpreter()". Both cases (and combinations of them) would be most useful when embedding Python within a library. Libraries generally shouldn’t make assumptions about the application that uses them, which include assuming a process-wide “main Python interpreter”. Historically, Python extension modules don’t handle this use case well. Many extension modules (and even some stdlib modules) use *per- process* global state, because C "static" variables are extremely easy to use. Thus, data that should be specific to an interpreter ends up being shared between interpreters. Unless the extension developer is careful, it is very easy to introduce edge cases that lead to crashes when a module is loaded in more than one interpreter in the same process. Unfortunately, *per-interpreter* state is not easy to achieve. Extension authors tend to not keep multiple interpreters in mind when developing, and it is currently cumbersome to test the behavior. Enter Per-Module State ---------------------- Instead of focusing on per-interpreter state, Python’s C API is evolving to better support the more granular *per-module* state. This means that C-level data should be attached to a *module object*. Each interpreter creates its own module object, keeping the data separate. For testing the isolation, multiple module objects corresponding to a single extension can even be loaded in a single interpreter. Per-module state provides an easy way to think about lifetime and resource ownership: the extension module will initialize when a module object is created, and clean up when it’s freed. In this regard, a module is just like any other PyObject*; there are no “on interpreter shutdown” hooks to think—or forget—about. Note that there are use cases for different kinds of “globals”: per- process, per-interpreter, per-thread or per-task state. With per- module state as the default, these are still possible, but you should treat them as exceptional cases: if you need them, you should give them additional care and testing. (Note that this guide does not cover them.) Isolated Module Objects ----------------------- The key point to keep in mind when developing an extension module is that several module objects can be created from a single shared library. For example: >>> import sys >>> import binascii >>> old_binascii = binascii >>> del sys.modules['binascii'] >>> import binascii # create a new module object >>> old_binascii == binascii False As a rule of thumb, the two modules should be completely independent. All objects and state specific to the module should be encapsulated within the module object, not shared with other module objects, and cleaned up when the module object is deallocated. Since this just is a rule of thumb, exceptions are possible (see Managing Global State), but they will need more thought and attention to edge cases. While some modules could do with less stringent restrictions, isolated modules make it easier to set clear expectations and guidelines that work across a variety of use cases. Surprising Edge Cases --------------------- Note that isolated modules do create some surprising edge cases. Most notably, each module object will typically not share its classes and exceptions with other similar modules. Continuing from the example above, note that "old_binascii.Error" and "binascii.Error" are separate objects. In the following code, the exception is *not* caught: >>> old_binascii.Error == binascii.Error False >>> try: ... old_binascii.unhexlify(b'qwertyuiop') ... except binascii.Error: ... print('boo') ... Traceback (most recent call last): File "", line 2, in binascii.Error: Non-hexadecimal digit found This is expected. Notice that pure-Python modules behave the same way: it is a part of how Python works. The goal is to make extension modules safe at the C level, not to make hacks behave intuitively. Mutating "sys.modules" “manually” counts as a hack. Making Modules Safe with Multiple Interpreters ============================================== Managing Global State --------------------- Sometimes, the state associated with a Python module is not specific to that module, but to the entire process (or something else “more global” than a module). For example: * The "readline" module manages *the* terminal. * A module running on a circuit board wants to control *the* on-board LED. In these cases, the Python module should provide *access* to the global state, rather than *own* it. If possible, write the module so that multiple copies of it can access the state independently (along with other libraries, whether for Python or other languages). If that is not possible, consider explicit locking. If it is necessary to use process-global state, the simplest way to avoid issues with multiple interpreters is to explicitly prevent a module from being loaded more than once per process—see Opt-Out: Limiting to One Module Object per Process. Managing Per-Module State ------------------------- To use per-module state, use multi-phase extension module initialization. This signals that your module supports multiple interpreters correctly. Set "PyModuleDef.m_size" to a positive number to request that many bytes of storage local to the module. Usually, this will be set to the size of some module-specific "struct", which can store all of the module’s C-level state. In particular, it is where you should put pointers to classes (including exceptions, but excluding static types) and settings (e.g. "csv"’s "field_size_limit") which the C code needs to function. Note: Another option is to store state in the module’s "__dict__", but you must avoid crashing when users modify "__dict__" from Python code. This usually means error- and type-checking at the C level, which is easy to get wrong and hard to test sufficiently.However, if module state is not needed in C code, storing it in "__dict__" only is a good idea. If the module state includes "PyObject" pointers, the module object must hold references to those objects and implement the module-level hooks "m_traverse", "m_clear" and "m_free". These work like "tp_traverse", "tp_clear" and "tp_free" of a class. Adding them will require some work and make the code longer; this is the price for modules which can be unloaded cleanly. An example of a module with per-module state is currently available as xxlimited; example module initialization shown at the bottom of the file. Opt-Out: Limiting to One Module Object per Process -------------------------------------------------- A non-negative "PyModuleDef.m_size" signals that a module supports multiple interpreters correctly. If this is not yet the case for your module, you can explicitly make your module loadable only once per process. For example: // A process-wide flag static int loaded = 0; // Mutex to provide thread safety (only needed for free-threaded Python) static PyMutex modinit_mutex = {0}; static int exec_module(PyObject* module) { PyMutex_Lock(&modinit_mutex); if (loaded) { PyMutex_Unlock(&modinit_mutex); PyErr_SetString(PyExc_ImportError, "cannot load module more than once per process"); return -1; } loaded = 1; PyMutex_Unlock(&modinit_mutex); // ... rest of initialization } If your module’s "PyModuleDef.m_clear" function is able to prepare for future re-initialization, it should clear the "loaded" flag. In this case, your module won’t support multiple instances existing *concurrently*, but it will, for example, support being loaded after Python runtime shutdown ("Py_FinalizeEx()") and re-initialization ("Py_Initialize()"). Module State Access from Functions ---------------------------------- Accessing the state from module-level functions is straightforward. Functions get the module object as their first argument; for extracting the state, you can use "PyModule_GetState": static PyObject * func(PyObject *module, PyObject *args) { my_struct *state = (my_struct*)PyModule_GetState(module); if (state == NULL) { return NULL; } // ... rest of logic } Note: "PyModule_GetState" may return "NULL" without setting an exception if there is no module state, i.e. "PyModuleDef.m_size" was zero. In your own module, you’re in control of "m_size", so this is easy to prevent. Heap Types ========== Traditionally, types defined in C code are *static*; that is, "static PyTypeObject" structures defined directly in code and initialized using "PyType_Ready()". Such types are necessarily shared across the process. Sharing them between module objects requires paying attention to any state they own or access. To limit the possible issues, static types are immutable at the Python level: for example, you can’t set "str.myattribute = 123". **CPython implementation detail:** Sharing truly immutable objects between interpreters is fine, as long as they don’t provide access to mutable objects. However, in CPython, every Python object has a mutable implementation detail: the reference count. Changes to the refcount are guarded by the GIL. Thus, code that shares any Python objects across interpreters implicitly depends on CPython’s current, process-wide GIL. Because they are immutable and process-global, static types cannot access “their” module state. If any method of such a type requires access to module state, the type must be converted to a *heap- allocated type*, or *heap type* for short. These correspond more closely to classes created by Python’s "class" statement. For new modules, using heap types by default is a good rule of thumb. Changing Static Types to Heap Types ----------------------------------- Static types can be converted to heap types, but note that the heap type API was not designed for “lossless” conversion from static types—that is, creating a type that works exactly like a given static type. So, when rewriting the class definition in a new API, you are likely to unintentionally change a few details (e.g. pickleability or inherited slots). Always test the details that are important to you. Watch out for the following two points in particular (but note that this is not a comprehensive list): * Unlike static types, heap type objects are mutable by default. Use the "Py_TPFLAGS_IMMUTABLETYPE" flag to prevent mutability. * Heap types inherit "tp_new" by default, so it may become possible to instantiate them from Python code. You can prevent this with the "Py_TPFLAGS_DISALLOW_INSTANTIATION" flag. Defining Heap Types ------------------- Heap types can be created by filling a "PyType_Spec" structure, a description or “blueprint” of a class, and calling "PyType_FromModuleAndSpec()" to construct a new class object. Note: Other functions, like "PyType_FromSpec()", can also create heap types, but "PyType_FromModuleAndSpec()" associates the module with the class, allowing access to the module state from methods. The class should generally be stored in *both* the module state (for safe access from C) and the module’s "__dict__" (for access from Python code). Garbage-Collection Protocol --------------------------- Instances of heap types hold a reference to their type. This ensures that the type isn’t destroyed before all its instances are, but may result in reference cycles that need to be broken by the garbage collector. To avoid memory leaks, instances of heap types must implement the garbage collection protocol. That is, heap types should: * Have the "Py_TPFLAGS_HAVE_GC" flag. * Define a traverse function using "Py_tp_traverse", which visits the type (e.g. using "Py_VISIT(Py_TYPE(self))"). Please refer to the documentation of "Py_TPFLAGS_HAVE_GC" and "tp_traverse" for additional considerations. The API for defining heap types grew organically, leaving it somewhat awkward to use in its current state. The following sections will guide you through common issues. "tp_traverse" in Python 3.8 and lower ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The requirement to visit the type from "tp_traverse" was added in Python 3.9. If you support Python 3.8 and lower, the traverse function must *not* visit the type, so it must be more complicated: static int my_traverse(PyObject *self, visitproc visit, void *arg) { if (Py_Version >= 0x03090000) { Py_VISIT(Py_TYPE(self)); } return 0; } Unfortunately, "Py_Version" was only added in Python 3.11. As a replacement, use: * "PY_VERSION_HEX", if not using the stable ABI, or * "sys.version_info" (via "PySys_GetObject()" and "PyArg_ParseTuple()"). Delegating "tp_traverse" ~~~~~~~~~~~~~~~~~~~~~~~~ If your traverse function delegates to the "tp_traverse" of its base class (or another type), ensure that "Py_TYPE(self)" is visited only once. Note that only heap type are expected to visit the type in "tp_traverse". For example, if your traverse function includes: base->tp_traverse(self, visit, arg) …and "base" may be a static type, then it should also include: if (base->tp_flags & Py_TPFLAGS_HEAPTYPE) { // a heap type's tp_traverse already visited Py_TYPE(self) } else { if (Py_Version >= 0x03090000) { Py_VISIT(Py_TYPE(self)); } } It is not necessary to handle the type’s reference count in "tp_new" and "tp_clear". Defining "tp_dealloc" ~~~~~~~~~~~~~~~~~~~~~ If your type has a custom "tp_dealloc" function, it needs to: * call "PyObject_GC_UnTrack()" before any fields are invalidated, and * decrement the reference count of the type. To keep the type valid while "tp_free" is called, the type’s refcount needs to be decremented *after* the instance is deallocated. For example: static void my_dealloc(PyObject *self) { PyObject_GC_UnTrack(self); ... PyTypeObject *type = Py_TYPE(self); type->tp_free(self); Py_DECREF(type); } The default "tp_dealloc" function does this, so if your type does *not* override "tp_dealloc" you don’t need to add it. Not overriding "tp_free" ~~~~~~~~~~~~~~~~~~~~~~~~ The "tp_free" slot of a heap type must be set to "PyObject_GC_Del()". This is the default; do not override it. Avoiding "PyObject_New" ~~~~~~~~~~~~~~~~~~~~~~~ GC-tracked objects need to be allocated using GC-aware functions. If you use use "PyObject_New()" or "PyObject_NewVar()": * Get and call type’s "tp_alloc" slot, if possible. That is, replace "TYPE *o = PyObject_New(TYPE, typeobj)" with: TYPE *o = typeobj->tp_alloc(typeobj, 0); Replace "o = PyObject_NewVar(TYPE, typeobj, size)" with the same, but use size instead of the 0. * If the above is not possible (e.g. inside a custom "tp_alloc"), call "PyObject_GC_New()" or "PyObject_GC_NewVar()": TYPE *o = PyObject_GC_New(TYPE, typeobj); TYPE *o = PyObject_GC_NewVar(TYPE, typeobj, size); Module State Access from Classes -------------------------------- If you have a type object defined with "PyType_FromModuleAndSpec()", you can call "PyType_GetModule()" to get the associated module, and then "PyModule_GetState()" to get the module’s state. To save a some tedious error-handling boilerplate code, you can combine these two steps with "PyType_GetModuleState()", resulting in: my_struct *state = (my_struct*)PyType_GetModuleState(type); if (state == NULL) { return NULL; } Module State Access from Regular Methods ---------------------------------------- Accessing the module-level state from methods of a class is somewhat more complicated, but is possible thanks to API introduced in Python 3.9. To get the state, you need to first get the *defining class*, and then get the module state from it. The largest roadblock is getting *the class a method was defined in*, or that method’s “defining class” for short. The defining class can have a reference to the module it is part of. Do not confuse the defining class with "Py_TYPE(self)". If the method is called on a *subclass* of your type, "Py_TYPE(self)" will refer to that subclass, which may be defined in different module than yours. Note: The following Python code can illustrate the concept. "Base.get_defining_class" returns "Base" even if "type(self) == Sub": class Base: def get_type_of_self(self): return type(self) def get_defining_class(self): return __class__ class Sub(Base): pass For a method to get its “defining class”, it must use the METH_METHOD | METH_FASTCALL | METH_KEYWORDS "calling convention" and the corresponding "PyCMethod" signature: PyObject *PyCMethod( PyObject *self, // object the method was called on PyTypeObject *defining_class, // defining class PyObject *const *args, // C array of arguments Py_ssize_t nargs, // length of "args" PyObject *kwnames) // NULL, or dict of keyword arguments Once you have the defining class, call "PyType_GetModuleState()" to get the state of its associated module. For example: static PyObject * example_method(PyObject *self, PyTypeObject *defining_class, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) { my_struct *state = (my_struct*)PyType_GetModuleState(defining_class); if (state == NULL) { return NULL; } ... // rest of logic } PyDoc_STRVAR(example_method_doc, "..."); static PyMethodDef my_methods[] = { {"example_method", (PyCFunction)(void(*)(void))example_method, METH_METHOD|METH_FASTCALL|METH_KEYWORDS, example_method_doc} {NULL}, } Module State Access from Slot Methods, Getters and Setters ---------------------------------------------------------- Note: This is new in Python 3.11. Slot methods—the fast C equivalents for special methods, such as "nb_add" for "__add__" or "tp_new" for initialization—have a very simple API that doesn’t allow passing in the defining class, unlike with "PyCMethod". The same goes for getters and setters defined with "PyGetSetDef". To access the module state in these cases, use the "PyType_GetModuleByDef()" function, and pass in the module definition. Once you have the module, call "PyModule_GetState()" to get the state: PyObject *module = PyType_GetModuleByDef(Py_TYPE(self), &module_def); my_struct *state = (my_struct*)PyModule_GetState(module); if (state == NULL) { return NULL; } "PyType_GetModuleByDef()" works by searching the *method resolution order* (i.e. all superclasses) for the first superclass that has a corresponding module. Note: In very exotic cases (inheritance chains spanning multiple modules created from the same definition), "PyType_GetModuleByDef()" might not return the module of the true defining class. However, it will always return a module with the same definition, ensuring a compatible C memory layout. Lifetime of the Module State ---------------------------- When a module object is garbage-collected, its module state is freed. For each pointer to (a part of) the module state, you must hold a reference to the module object. Usually this is not an issue, because types created with "PyType_FromModuleAndSpec()", and their instances, hold a reference to the module. However, you must be careful in reference counting when you reference module state from other places, such as callbacks for external libraries. Open Issues =========== Several issues around per-module state and heap types are still open. Discussions about improving the situation are best held on the capi- sig mailing list. Per-Class Scope --------------- It is currently (as of Python 3.11) not possible to attach state to individual *types* without relying on CPython implementation details (which may change in the future—perhaps, ironically, to allow a proper solution for per-class scope). Lossless Conversion to Heap Types --------------------------------- The heap type API was not designed for “lossless” conversion from static types; that is, creating a type that works exactly like a given static type. Logging Cookbook **************** Author: Vinay Sajip This page contains a number of recipes related to logging, which have been found useful in the past. For links to tutorial and reference information, please see Other resources. Using logging in multiple modules ================================= Multiple calls to "logging.getLogger('someLogger')" return a reference to the same logger object. This is true not only within the same module, but also across modules as long as it is in the same Python interpreter process. It is true for references to the same object; additionally, application code can define and configure a parent logger in one module and create (but not configure) a child logger in a separate module, and all logger calls to the child will pass up to the parent. Here is a main module: import logging import auxiliary_module # create logger with 'spam_application' logger = logging.getLogger('spam_application') logger.setLevel(logging.DEBUG) # create file handler which logs even debug messages fh = logging.FileHandler('spam.log') fh.setLevel(logging.DEBUG) # create console handler with a higher log level ch = logging.StreamHandler() ch.setLevel(logging.ERROR) # create formatter and add it to the handlers formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') fh.setFormatter(formatter) ch.setFormatter(formatter) # add the handlers to the logger logger.addHandler(fh) logger.addHandler(ch) logger.info('creating an instance of auxiliary_module.Auxiliary') a = auxiliary_module.Auxiliary() logger.info('created an instance of auxiliary_module.Auxiliary') logger.info('calling auxiliary_module.Auxiliary.do_something') a.do_something() logger.info('finished auxiliary_module.Auxiliary.do_something') logger.info('calling auxiliary_module.some_function()') auxiliary_module.some_function() logger.info('done with auxiliary_module.some_function()') Here is the auxiliary module: import logging # create logger module_logger = logging.getLogger('spam_application.auxiliary') class Auxiliary: def __init__(self): self.logger = logging.getLogger('spam_application.auxiliary.Auxiliary') self.logger.info('creating an instance of Auxiliary') def do_something(self): self.logger.info('doing something') a = 1 + 1 self.logger.info('done doing something') def some_function(): module_logger.info('received a call to "some_function"') The output looks like this: 2005-03-23 23:47:11,663 - spam_application - INFO - creating an instance of auxiliary_module.Auxiliary 2005-03-23 23:47:11,665 - spam_application.auxiliary.Auxiliary - INFO - creating an instance of Auxiliary 2005-03-23 23:47:11,665 - spam_application - INFO - created an instance of auxiliary_module.Auxiliary 2005-03-23 23:47:11,668 - spam_application - INFO - calling auxiliary_module.Auxiliary.do_something 2005-03-23 23:47:11,668 - spam_application.auxiliary.Auxiliary - INFO - doing something 2005-03-23 23:47:11,669 - spam_application.auxiliary.Auxiliary - INFO - done doing something 2005-03-23 23:47:11,670 - spam_application - INFO - finished auxiliary_module.Auxiliary.do_something 2005-03-23 23:47:11,671 - spam_application - INFO - calling auxiliary_module.some_function() 2005-03-23 23:47:11,672 - spam_application.auxiliary - INFO - received a call to 'some_function' 2005-03-23 23:47:11,673 - spam_application - INFO - done with auxiliary_module.some_function() Logging from multiple threads ============================= Logging from multiple threads requires no special effort. The following example shows logging from the main (initial) thread and another thread: import logging import threading import time def worker(arg): while not arg['stop']: logging.debug('Hi from myfunc') time.sleep(0.5) def main(): logging.basicConfig(level=logging.DEBUG, format='%(relativeCreated)6d %(threadName)s %(message)s') info = {'stop': False} thread = threading.Thread(target=worker, args=(info,)) thread.start() while True: try: logging.debug('Hello from main') time.sleep(0.75) except KeyboardInterrupt: info['stop'] = True break thread.join() if __name__ == '__main__': main() When run, the script should print something like the following: 0 Thread-1 Hi from myfunc 3 MainThread Hello from main 505 Thread-1 Hi from myfunc 755 MainThread Hello from main 1007 Thread-1 Hi from myfunc 1507 MainThread Hello from main 1508 Thread-1 Hi from myfunc 2010 Thread-1 Hi from myfunc 2258 MainThread Hello from main 2512 Thread-1 Hi from myfunc 3009 MainThread Hello from main 3013 Thread-1 Hi from myfunc 3515 Thread-1 Hi from myfunc 3761 MainThread Hello from main 4017 Thread-1 Hi from myfunc 4513 MainThread Hello from main 4518 Thread-1 Hi from myfunc This shows the logging output interspersed as one might expect. This approach works for more threads than shown here, of course. Multiple handlers and formatters ================================ Loggers are plain Python objects. The "addHandler()" method has no minimum or maximum quota for the number of handlers you may add. Sometimes it will be beneficial for an application to log all messages of all severities to a text file while simultaneously logging errors or above to the console. To set this up, simply configure the appropriate handlers. The logging calls in the application code will remain unchanged. Here is a slight modification to the previous simple module-based configuration example: import logging logger = logging.getLogger('simple_example') logger.setLevel(logging.DEBUG) # create file handler which logs even debug messages fh = logging.FileHandler('spam.log') fh.setLevel(logging.DEBUG) # create console handler with a higher log level ch = logging.StreamHandler() ch.setLevel(logging.ERROR) # create formatter and add it to the handlers formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') ch.setFormatter(formatter) fh.setFormatter(formatter) # add the handlers to logger logger.addHandler(ch) logger.addHandler(fh) # 'application' code logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') Notice that the ‘application’ code does not care about multiple handlers. All that changed was the addition and configuration of a new handler named *fh*. The ability to create new handlers with higher- or lower-severity filters can be very helpful when writing and testing an application. Instead of using many "print" statements for debugging, use "logger.debug": Unlike the print statements, which you will have to delete or comment out later, the logger.debug statements can remain intact in the source code and remain dormant until you need them again. At that time, the only change that needs to happen is to modify the severity level of the logger and/or handler to debug. Logging to multiple destinations ================================ Let’s say you want to log to console and file with different message formats and in differing circumstances. Say you want to log messages with levels of DEBUG and higher to file, and those messages at level INFO and higher to the console. Let’s also assume that the file should contain timestamps, but the console messages should not. Here’s how you can achieve this: import logging # set up logging to file - see previous section for more details logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s', datefmt='%m-%d %H:%M', filename='/tmp/myapp.log', filemode='w') # define a Handler which writes INFO messages or higher to the sys.stderr console = logging.StreamHandler() console.setLevel(logging.INFO) # set a format which is simpler for console use formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s') # tell the handler to use this format console.setFormatter(formatter) # add the handler to the root logger logging.getLogger('').addHandler(console) # Now, we can log to the root logger, or any other logger. First the root... logging.info('Jackdaws love my big sphinx of quartz.') # Now, define a couple of other loggers which might represent areas in your # application: logger1 = logging.getLogger('myapp.area1') logger2 = logging.getLogger('myapp.area2') logger1.debug('Quick zephyrs blow, vexing daft Jim.') logger1.info('How quickly daft jumping zebras vex.') logger2.warning('Jail zesty vixen who grabbed pay from quack.') logger2.error('The five boxing wizards jump quickly.') When you run this, on the console you will see root : INFO Jackdaws love my big sphinx of quartz. myapp.area1 : INFO How quickly daft jumping zebras vex. myapp.area2 : WARNING Jail zesty vixen who grabbed pay from quack. myapp.area2 : ERROR The five boxing wizards jump quickly. and in the file you will see something like 10-22 22:19 root INFO Jackdaws love my big sphinx of quartz. 10-22 22:19 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim. 10-22 22:19 myapp.area1 INFO How quickly daft jumping zebras vex. 10-22 22:19 myapp.area2 WARNING Jail zesty vixen who grabbed pay from quack. 10-22 22:19 myapp.area2 ERROR The five boxing wizards jump quickly. As you can see, the DEBUG message only shows up in the file. The other messages are sent to both destinations. This example uses console and file handlers, but you can use any number and combination of handlers you choose. Note that the above choice of log filename "/tmp/myapp.log" implies use of a standard location for temporary files on POSIX systems. On Windows, you may need to choose a different directory name for the log - just ensure that the directory exists and that you have the permissions to create and update files in it. Custom handling of levels ========================= Sometimes, you might want to do something slightly different from the standard handling of levels in handlers, where all levels above a threshold get processed by a handler. To do this, you need to use filters. Let’s look at a scenario where you want to arrange things as follows: * Send messages of severity "INFO" and "WARNING" to "sys.stdout" * Send messages of severity "ERROR" and above to "sys.stderr" * Send messages of severity "DEBUG" and above to file "app.log" Suppose you configure logging with the following JSON: { "version": 1, "disable_existing_loggers": false, "formatters": { "simple": { "format": "%(levelname)-8s - %(message)s" } }, "handlers": { "stdout": { "class": "logging.StreamHandler", "level": "INFO", "formatter": "simple", "stream": "ext://sys.stdout" }, "stderr": { "class": "logging.StreamHandler", "level": "ERROR", "formatter": "simple", "stream": "ext://sys.stderr" }, "file": { "class": "logging.FileHandler", "formatter": "simple", "filename": "app.log", "mode": "w" } }, "root": { "level": "DEBUG", "handlers": [ "stderr", "stdout", "file" ] } } This configuration does *almost* what we want, except that "sys.stdout" would show messages of severity "ERROR" and only events of this severity and higher will be tracked as well as "INFO" and "WARNING" messages. To prevent this, we can set up a filter which excludes those messages and add it to the relevant handler. This can be configured by adding a "filters" section parallel to "formatters" and "handlers": { "filters": { "warnings_and_below": { "()" : "__main__.filter_maker", "level": "WARNING" } } } and changing the section on the "stdout" handler to add it: { "stdout": { "class": "logging.StreamHandler", "level": "INFO", "formatter": "simple", "stream": "ext://sys.stdout", "filters": ["warnings_and_below"] } } A filter is just a function, so we can define the "filter_maker" (a factory function) as follows: def filter_maker(level): level = getattr(logging, level) def filter(record): return record.levelno <= level return filter This converts the string argument passed in to a numeric level, and returns a function which only returns "True" if the level of the passed in record is at or below the specified level. Note that in this example I have defined the "filter_maker" in a test script "main.py" that I run from the command line, so its module will be "__main__" - hence the "__main__.filter_maker" in the filter configuration. You will need to change that if you define it in a different module. With the filter added, we can run "main.py", which in full is: import json import logging import logging.config CONFIG = ''' { "version": 1, "disable_existing_loggers": false, "formatters": { "simple": { "format": "%(levelname)-8s - %(message)s" } }, "filters": { "warnings_and_below": { "()" : "__main__.filter_maker", "level": "WARNING" } }, "handlers": { "stdout": { "class": "logging.StreamHandler", "level": "INFO", "formatter": "simple", "stream": "ext://sys.stdout", "filters": ["warnings_and_below"] }, "stderr": { "class": "logging.StreamHandler", "level": "ERROR", "formatter": "simple", "stream": "ext://sys.stderr" }, "file": { "class": "logging.FileHandler", "formatter": "simple", "filename": "app.log", "mode": "w" } }, "root": { "level": "DEBUG", "handlers": [ "stderr", "stdout", "file" ] } } ''' def filter_maker(level): level = getattr(logging, level) def filter(record): return record.levelno <= level return filter logging.config.dictConfig(json.loads(CONFIG)) logging.debug('A DEBUG message') logging.info('An INFO message') logging.warning('A WARNING message') logging.error('An ERROR message') logging.critical('A CRITICAL message') And after running it like this: python main.py 2>stderr.log >stdout.log We can see the results are as expected: $ more *.log :::::::::::::: app.log :::::::::::::: DEBUG - A DEBUG message INFO - An INFO message WARNING - A WARNING message ERROR - An ERROR message CRITICAL - A CRITICAL message :::::::::::::: stderr.log :::::::::::::: ERROR - An ERROR message CRITICAL - A CRITICAL message :::::::::::::: stdout.log :::::::::::::: INFO - An INFO message WARNING - A WARNING message Configuration server example ============================ Here is an example of a module using the logging configuration server: import logging import logging.config import time import os # read initial config file logging.config.fileConfig('logging.conf') # create and start listener on port 9999 t = logging.config.listen(9999) t.start() logger = logging.getLogger('simpleExample') try: # loop through logging calls to see the difference # new configurations make, until Ctrl+C is pressed while True: logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') time.sleep(5) except KeyboardInterrupt: # cleanup logging.config.stopListening() t.join() And here is a script that takes a filename and sends that file to the server, properly preceded with the binary-encoded length, as the new logging configuration: #!/usr/bin/env python import socket, sys, struct with open(sys.argv[1], 'rb') as f: data_to_send = f.read() HOST = 'localhost' PORT = 9999 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) print('connecting...') s.connect((HOST, PORT)) print('sending config...') s.send(struct.pack('>L', len(data_to_send))) s.send(data_to_send) s.close() print('complete') Dealing with handlers that block ================================ Sometimes you have to get your logging handlers to do their work without blocking the thread you’re logging from. This is common in web applications, though of course it also occurs in other scenarios. A common culprit which demonstrates sluggish behaviour is the "SMTPHandler": sending emails can take a long time, for a number of reasons outside the developer’s control (for example, a poorly performing mail or network infrastructure). But almost any network- based handler can block: Even a "SocketHandler" operation may do a DNS query under the hood which is too slow (and this query can be deep in the socket library code, below the Python layer, and outside your control). One solution is to use a two-part approach. For the first part, attach only a "QueueHandler" to those loggers which are accessed from performance-critical threads. They simply write to their queue, which can be sized to a large enough capacity or initialized with no upper bound to their size. The write to the queue will typically be accepted quickly, though you will probably need to catch the "queue.Full" exception as a precaution in your code. If you are a library developer who has performance-critical threads in their code, be sure to document this (together with a suggestion to attach only "QueueHandlers" to your loggers) for the benefit of other developers who will use your code. The second part of the solution is "QueueListener", which has been designed as the counterpart to "QueueHandler". A "QueueListener" is very simple: it’s passed a queue and some handlers, and it fires up an internal thread which listens to its queue for LogRecords sent from "QueueHandlers" (or any other source of "LogRecords", for that matter). The "LogRecords" are removed from the queue and passed to the handlers for processing. The advantage of having a separate "QueueListener" class is that you can use the same instance to service multiple "QueueHandlers". This is more resource-friendly than, say, having threaded versions of the existing handler classes, which would eat up one thread per handler for no particular benefit. An example of using these two classes follows (imports omitted): que = queue.Queue(-1) # no limit on size queue_handler = QueueHandler(que) handler = logging.StreamHandler() listener = QueueListener(que, handler) root = logging.getLogger() root.addHandler(queue_handler) formatter = logging.Formatter('%(threadName)s: %(message)s') handler.setFormatter(formatter) listener.start() # The log output will display the thread which generated # the event (the main thread) rather than the internal # thread which monitors the internal queue. This is what # you want to happen. root.warning('Look out!') listener.stop() which, when run, will produce: MainThread: Look out! Note: Although the earlier discussion wasn’t specifically talking about async code, but rather about slow logging handlers, it should be noted that when logging from async code, network and even file handlers could lead to problems (blocking the event loop) because some logging is done from "asyncio" internals. It might be best, if any async code is used in an application, to use the above approach for logging, so that any blocking code runs only in the "QueueListener" thread. Changed in version 3.5: Prior to Python 3.5, the "QueueListener" always passed every message received from the queue to every handler it was initialized with. (This was because it was assumed that level filtering was all done on the other side, where the queue is filled.) From 3.5 onwards, this behaviour can be changed by passing a keyword argument "respect_handler_level=True" to the listener’s constructor. When this is done, the listener compares the level of each message with the handler’s level, and only passes a message to a handler if it’s appropriate to do so. Sending and receiving logging events across a network ===================================================== Let’s say you want to send logging events across a network, and handle them at the receiving end. A simple way of doing this is attaching a "SocketHandler" instance to the root logger at the sending end: import logging, logging.handlers rootLogger = logging.getLogger('') rootLogger.setLevel(logging.DEBUG) socketHandler = logging.handlers.SocketHandler('localhost', logging.handlers.DEFAULT_TCP_LOGGING_PORT) # don't bother with a formatter, since a socket handler sends the event as # an unformatted pickle rootLogger.addHandler(socketHandler) # Now, we can log to the root logger, or any other logger. First the root... logging.info('Jackdaws love my big sphinx of quartz.') # Now, define a couple of other loggers which might represent areas in your # application: logger1 = logging.getLogger('myapp.area1') logger2 = logging.getLogger('myapp.area2') logger1.debug('Quick zephyrs blow, vexing daft Jim.') logger1.info('How quickly daft jumping zebras vex.') logger2.warning('Jail zesty vixen who grabbed pay from quack.') logger2.error('The five boxing wizards jump quickly.') At the receiving end, you can set up a receiver using the "socketserver" module. Here is a basic working example: import pickle import logging import logging.handlers import socketserver import struct class LogRecordStreamHandler(socketserver.StreamRequestHandler): """Handler for a streaming logging request. This basically logs the record using whatever logging policy is configured locally. """ def handle(self): """ Handle multiple requests - each expected to be a 4-byte length, followed by the LogRecord in pickle format. Logs the record according to whatever policy is configured locally. """ while True: chunk = self.connection.recv(4) if len(chunk) < 4: break slen = struct.unpack('>L', chunk)[0] chunk = self.connection.recv(slen) while len(chunk) < slen: chunk = chunk + self.connection.recv(slen - len(chunk)) obj = self.unPickle(chunk) record = logging.makeLogRecord(obj) self.handleLogRecord(record) def unPickle(self, data): return pickle.loads(data) def handleLogRecord(self, record): # if a name is specified, we use the named logger rather than the one # implied by the record. if self.server.logname is not None: name = self.server.logname else: name = record.name logger = logging.getLogger(name) # N.B. EVERY record gets logged. This is because Logger.handle # is normally called AFTER logger-level filtering. If you want # to do filtering, do it at the client end to save wasting # cycles and network bandwidth! logger.handle(record) class LogRecordSocketReceiver(socketserver.ThreadingTCPServer): """ Simple TCP socket-based logging receiver suitable for testing. """ allow_reuse_address = True def __init__(self, host='localhost', port=logging.handlers.DEFAULT_TCP_LOGGING_PORT, handler=LogRecordStreamHandler): socketserver.ThreadingTCPServer.__init__(self, (host, port), handler) self.abort = 0 self.timeout = 1 self.logname = None def serve_until_stopped(self): import select abort = 0 while not abort: rd, wr, ex = select.select([self.socket.fileno()], [], [], self.timeout) if rd: self.handle_request() abort = self.abort def main(): logging.basicConfig( format='%(relativeCreated)5d %(name)-15s %(levelname)-8s %(message)s') tcpserver = LogRecordSocketReceiver() print('About to start TCP server...') tcpserver.serve_until_stopped() if __name__ == '__main__': main() First run the server, and then the client. On the client side, nothing is printed on the console; on the server side, you should see something like: About to start TCP server... 59 root INFO Jackdaws love my big sphinx of quartz. 59 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim. 69 myapp.area1 INFO How quickly daft jumping zebras vex. 69 myapp.area2 WARNING Jail zesty vixen who grabbed pay from quack. 69 myapp.area2 ERROR The five boxing wizards jump quickly. Note that there are some security issues with pickle in some scenarios. If these affect you, you can use an alternative serialization scheme by overriding the "makePickle()" method and implementing your alternative there, as well as adapting the above script to use your alternative serialization. Running a logging socket listener in production ----------------------------------------------- To run a logging listener in production, you may need to use a process-management tool such as Supervisor. Here is a Gist which provides the bare-bones files to run the above functionality using Supervisor. It consists of the following files: +---------------------------+------------------------------------------------------+ | File | Purpose | |===========================|======================================================| | "prepare.sh" | A Bash script to prepare the environment for testing | +---------------------------+------------------------------------------------------+ | "supervisor.conf" | The Supervisor configuration file, which has entries | | | for the listener and a multi-process web application | +---------------------------+------------------------------------------------------+ | "ensure_app.sh" | A Bash script to ensure that Supervisor is running | | | with the above configuration | +---------------------------+------------------------------------------------------+ | "log_listener.py" | The socket listener program which receives log | | | events and records them to a file | +---------------------------+------------------------------------------------------+ | "main.py" | A simple web application which performs logging via | | | a socket connected to the listener | +---------------------------+------------------------------------------------------+ | "webapp.json" | A JSON configuration file for the web application | +---------------------------+------------------------------------------------------+ | "client.py" | A Python script to exercise the web application | +---------------------------+------------------------------------------------------+ The web application uses Gunicorn, which is a popular web application server that starts multiple worker processes to handle requests. This example setup shows how the workers can write to the same log file without conflicting with one another — they all go through the socket listener. To test these files, do the following in a POSIX environment: 1. Download the Gist as a ZIP archive using the Download ZIP button. 2. Unzip the above files from the archive into a scratch directory. 3. In the scratch directory, run "bash prepare.sh" to get things ready. This creates a "run" subdirectory to contain Supervisor- related and log files, and a "venv" subdirectory to contain a virtual environment into which "bottle", "gunicorn" and "supervisor" are installed. 4. Run "bash ensure_app.sh" to ensure that Supervisor is running with the above configuration. 5. Run "venv/bin/python client.py" to exercise the web application, which will lead to records being written to the log. 6. Inspect the log files in the "run" subdirectory. You should see the most recent log lines in files matching the pattern "app.log*". They won’t be in any particular order, since they have been handled concurrently by different worker processes in a non-deterministic way. 7. You can shut down the listener and the web application by running "venv/bin/supervisorctl -c supervisor.conf shutdown". You may need to tweak the configuration files in the unlikely event that the configured ports clash with something else in your test environment. The default configuration uses a TCP socket on port 9020. You can use a Unix Domain socket instead of a TCP socket by doing the following: 1. In "listener.json", add a "socket" key with the path to the domain socket you want to use. If this key is present, the listener listens on the corresponding domain socket and not on a TCP socket (the "port" key is ignored). 2. In "webapp.json", change the socket handler configuration dictionary so that the "host" value is the path to the domain socket, and set the "port" value to "null". Adding contextual information to your logging output ==================================================== Sometimes you want logging output to contain contextual information in addition to the parameters passed to the logging call. For example, in a networked application, it may be desirable to log client-specific information in the log (e.g. remote client’s username, or IP address). Although you could use the *extra* parameter to achieve this, it’s not always convenient to pass the information in this way. While it might be tempting to create "Logger" instances on a per-connection basis, this is not a good idea because these instances are not garbage collected. While this is not a problem in practice, when the number of "Logger" instances is dependent on the level of granularity you want to use in logging an application, it could be hard to manage if the number of "Logger" instances becomes effectively unbounded. Using LoggerAdapters to impart contextual information ----------------------------------------------------- An easy way in which you can pass contextual information to be output along with logging event information is to use the "LoggerAdapter" class. This class is designed to look like a "Logger", so that you can call "debug()", "info()", "warning()", "error()", "exception()", "critical()" and "log()". These methods have the same signatures as their counterparts in "Logger", so you can use the two types of instances interchangeably. When you create an instance of "LoggerAdapter", you pass it a "Logger" instance and a dict-like object which contains your contextual information. When you call one of the logging methods on an instance of "LoggerAdapter", it delegates the call to the underlying instance of "Logger" passed to its constructor, and arranges to pass the contextual information in the delegated call. Here’s a snippet from the code of "LoggerAdapter": def debug(self, msg, /, *args, **kwargs): """ Delegate a debug call to the underlying logger, after adding contextual information from this adapter instance. """ msg, kwargs = self.process(msg, kwargs) self.logger.debug(msg, *args, **kwargs) The "process()" method of "LoggerAdapter" is where the contextual information is added to the logging output. It’s passed the message and keyword arguments of the logging call, and it passes back (potentially) modified versions of these to use in the call to the underlying logger. The default implementation of this method leaves the message alone, but inserts an ‘extra’ key in the keyword argument whose value is the dict-like object passed to the constructor. Of course, if you had passed an ‘extra’ keyword argument in the call to the adapter, it will be silently overwritten. The advantage of using ‘extra’ is that the values in the dict-like object are merged into the "LogRecord" instance’s __dict__, allowing you to use customized strings with your "Formatter" instances which know about the keys of the dict-like object. If you need a different method, e.g. if you want to prepend or append the contextual information to the message string, you just need to subclass "LoggerAdapter" and override "process()" to do what you need. Here is a simple example: class CustomAdapter(logging.LoggerAdapter): """ This example adapter expects the passed in dict-like object to have a 'connid' key, whose value in brackets is prepended to the log message. """ def process(self, msg, kwargs): return '[%s] %s' % (self.extra['connid'], msg), kwargs which you can use like this: logger = logging.getLogger(__name__) adapter = CustomAdapter(logger, {'connid': some_conn_id}) Then any events that you log to the adapter will have the value of "some_conn_id" prepended to the log messages. Using objects other than dicts to pass contextual information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You don’t need to pass an actual dict to a "LoggerAdapter" - you could pass an instance of a class which implements "__getitem__" and "__iter__" so that it looks like a dict to logging. This would be useful if you want to generate values dynamically (whereas the values in a dict would be constant). Using Filters to impart contextual information ---------------------------------------------- You can also add contextual information to log output using a user- defined "Filter". "Filter" instances are allowed to modify the "LogRecords" passed to them, including adding additional attributes which can then be output using a suitable format string, or if needed a custom "Formatter". For example in a web application, the request being processed (or at least, the interesting parts of it) can be stored in a threadlocal ("threading.local") variable, and then accessed from a "Filter" to add, say, information from the request - say, the remote IP address and remote user’s username - to the "LogRecord", using the attribute names ‘ip’ and ‘user’ as in the "LoggerAdapter" example above. In that case, the same format string can be used to get similar output to that shown above. Here’s an example script: import logging from random import choice class ContextFilter(logging.Filter): """ This is a filter which injects contextual information into the log. Rather than use actual contextual information, we just use random data in this demo. """ USERS = ['jim', 'fred', 'sheila'] IPS = ['123.231.231.123', '127.0.0.1', '192.168.0.1'] def filter(self, record): record.ip = choice(ContextFilter.IPS) record.user = choice(ContextFilter.USERS) return True if __name__ == '__main__': levels = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL) logging.basicConfig(level=logging.DEBUG, format='%(asctime)-15s %(name)-5s %(levelname)-8s IP: %(ip)-15s User: %(user)-8s %(message)s') a1 = logging.getLogger('a.b.c') a2 = logging.getLogger('d.e.f') f = ContextFilter() a1.addFilter(f) a2.addFilter(f) a1.debug('A debug message') a1.info('An info message with %s', 'some parameters') for x in range(10): lvl = choice(levels) lvlname = logging.getLevelName(lvl) a2.log(lvl, 'A message at %s level with %d %s', lvlname, 2, 'parameters') which, when run, produces something like: 2010-09-06 22:38:15,292 a.b.c DEBUG IP: 123.231.231.123 User: fred A debug message 2010-09-06 22:38:15,300 a.b.c INFO IP: 192.168.0.1 User: sheila An info message with some parameters 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 127.0.0.1 User: sheila A message at CRITICAL level with 2 parameters 2010-09-06 22:38:15,300 d.e.f ERROR IP: 127.0.0.1 User: jim A message at ERROR level with 2 parameters 2010-09-06 22:38:15,300 d.e.f DEBUG IP: 127.0.0.1 User: sheila A message at DEBUG level with 2 parameters 2010-09-06 22:38:15,300 d.e.f ERROR IP: 123.231.231.123 User: fred A message at ERROR level with 2 parameters 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 192.168.0.1 User: jim A message at CRITICAL level with 2 parameters 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 127.0.0.1 User: sheila A message at CRITICAL level with 2 parameters 2010-09-06 22:38:15,300 d.e.f DEBUG IP: 192.168.0.1 User: jim A message at DEBUG level with 2 parameters 2010-09-06 22:38:15,301 d.e.f ERROR IP: 127.0.0.1 User: sheila A message at ERROR level with 2 parameters 2010-09-06 22:38:15,301 d.e.f DEBUG IP: 123.231.231.123 User: fred A message at DEBUG level with 2 parameters 2010-09-06 22:38:15,301 d.e.f INFO IP: 123.231.231.123 User: fred A message at INFO level with 2 parameters Use of "contextvars" ==================== Since Python 3.7, the "contextvars" module has provided context-local storage which works for both "threading" and "asyncio" processing needs. This type of storage may thus be generally preferable to thread-locals. The following example shows how, in a multi-threaded environment, logs can populated with contextual information such as, for example, request attributes handled by web applications. For the purposes of illustration, say that you have different web applications, each independent of the other but running in the same Python process and using a library common to them. How can each of these applications have their own log, where all logging messages from the library (and other request processing code) are directed to the appropriate application’s log file, while including in the log additional contextual information such as client IP, HTTP request method and client username? Let’s assume that the library can be simulated by the following code: # webapplib.py import logging import time logger = logging.getLogger(__name__) def useful(): # Just a representative event logged from the library logger.debug('Hello from webapplib!') # Just sleep for a bit so other threads get to run time.sleep(0.01) We can simulate the multiple web applications by means of two simple classes, "Request" and "WebApp". These simulate how real threaded web applications work - each request is handled by a thread: # main.py import argparse from contextvars import ContextVar import logging import os from random import choice import threading import webapplib logger = logging.getLogger(__name__) root = logging.getLogger() root.setLevel(logging.DEBUG) class Request: """ A simple dummy request class which just holds dummy HTTP request method, client IP address and client username """ def __init__(self, method, ip, user): self.method = method self.ip = ip self.user = user # A dummy set of requests which will be used in the simulation - we'll just pick # from this list randomly. Note that all GET requests are from 192.168.2.XXX # addresses, whereas POST requests are from 192.16.3.XXX addresses. Three users # are represented in the sample requests. REQUESTS = [ Request('GET', '192.168.2.20', 'jim'), Request('POST', '192.168.3.20', 'fred'), Request('GET', '192.168.2.21', 'sheila'), Request('POST', '192.168.3.21', 'jim'), Request('GET', '192.168.2.22', 'fred'), Request('POST', '192.168.3.22', 'sheila'), ] # Note that the format string includes references to request context information # such as HTTP method, client IP and username formatter = logging.Formatter('%(threadName)-11s %(appName)s %(name)-9s %(user)-6s %(ip)s %(method)-4s %(message)s') # Create our context variables. These will be filled at the start of request # processing, and used in the logging that happens during that processing ctx_request = ContextVar('request') ctx_appname = ContextVar('appname') class InjectingFilter(logging.Filter): """ A filter which injects context-specific information into logs and ensures that only information for a specific webapp is included in its log """ def __init__(self, app): self.app = app def filter(self, record): request = ctx_request.get() record.method = request.method record.ip = request.ip record.user = request.user record.appName = appName = ctx_appname.get() return appName == self.app.name class WebApp: """ A dummy web application class which has its own handler and filter for a webapp-specific log. """ def __init__(self, name): self.name = name handler = logging.FileHandler(name + '.log', 'w') f = InjectingFilter(self) handler.setFormatter(formatter) handler.addFilter(f) root.addHandler(handler) self.num_requests = 0 def process_request(self, request): """ This is the dummy method for processing a request. It's called on a different thread for every request. We store the context information into the context vars before doing anything else. """ ctx_request.set(request) ctx_appname.set(self.name) self.num_requests += 1 logger.debug('Request processing started') webapplib.useful() logger.debug('Request processing finished') def main(): fn = os.path.splitext(os.path.basename(__file__))[0] adhf = argparse.ArgumentDefaultsHelpFormatter ap = argparse.ArgumentParser(formatter_class=adhf, prog=fn, description='Simulate a couple of web ' 'applications handling some ' 'requests, showing how request ' 'context can be used to ' 'populate logs') aa = ap.add_argument aa('--count', '-c', type=int, default=100, help='How many requests to simulate') options = ap.parse_args() # Create the dummy webapps and put them in a list which we can use to select # from randomly app1 = WebApp('app1') app2 = WebApp('app2') apps = [app1, app2] threads = [] # Add a common handler which will capture all events handler = logging.FileHandler('app.log', 'w') handler.setFormatter(formatter) root.addHandler(handler) # Generate calls to process requests for i in range(options.count): try: # Pick an app at random and a request for it to process app = choice(apps) request = choice(REQUESTS) # Process the request in its own thread t = threading.Thread(target=app.process_request, args=(request,)) threads.append(t) t.start() except KeyboardInterrupt: break # Wait for the threads to terminate for t in threads: t.join() for app in apps: print('%s processed %s requests' % (app.name, app.num_requests)) if __name__ == '__main__': main() If you run the above, you should find that roughly half the requests go into "app1.log" and the rest into "app2.log", and the all the requests are logged to "app.log". Each webapp-specific log will contain only log entries for only that webapp, and the request information will be displayed consistently in the log (i.e. the information in each dummy request will always appear together in a log line). This is illustrated by the following shell output: ~/logging-contextual-webapp$ python main.py app1 processed 51 requests app2 processed 49 requests ~/logging-contextual-webapp$ wc -l *.log 153 app1.log 147 app2.log 300 app.log 600 total ~/logging-contextual-webapp$ head -3 app1.log Thread-3 (process_request) app1 __main__ jim 192.168.3.21 POST Request processing started Thread-3 (process_request) app1 webapplib jim 192.168.3.21 POST Hello from webapplib! Thread-5 (process_request) app1 __main__ jim 192.168.3.21 POST Request processing started ~/logging-contextual-webapp$ head -3 app2.log Thread-1 (process_request) app2 __main__ sheila 192.168.2.21 GET Request processing started Thread-1 (process_request) app2 webapplib sheila 192.168.2.21 GET Hello from webapplib! Thread-2 (process_request) app2 __main__ jim 192.168.2.20 GET Request processing started ~/logging-contextual-webapp$ head app.log Thread-1 (process_request) app2 __main__ sheila 192.168.2.21 GET Request processing started Thread-1 (process_request) app2 webapplib sheila 192.168.2.21 GET Hello from webapplib! Thread-2 (process_request) app2 __main__ jim 192.168.2.20 GET Request processing started Thread-3 (process_request) app1 __main__ jim 192.168.3.21 POST Request processing started Thread-2 (process_request) app2 webapplib jim 192.168.2.20 GET Hello from webapplib! Thread-3 (process_request) app1 webapplib jim 192.168.3.21 POST Hello from webapplib! Thread-4 (process_request) app2 __main__ fred 192.168.2.22 GET Request processing started Thread-5 (process_request) app1 __main__ jim 192.168.3.21 POST Request processing started Thread-4 (process_request) app2 webapplib fred 192.168.2.22 GET Hello from webapplib! Thread-6 (process_request) app1 __main__ jim 192.168.3.21 POST Request processing started ~/logging-contextual-webapp$ grep app1 app1.log | wc -l 153 ~/logging-contextual-webapp$ grep app2 app2.log | wc -l 147 ~/logging-contextual-webapp$ grep app1 app.log | wc -l 153 ~/logging-contextual-webapp$ grep app2 app.log | wc -l 147 Imparting contextual information in handlers ============================================ Each "Handler" has its own chain of filters. If you want to add contextual information to a "LogRecord" without leaking it to other handlers, you can use a filter that returns a new "LogRecord" instead of modifying it in-place, as shown in the following script: import copy import logging def filter(record: logging.LogRecord): record = copy.copy(record) record.user = 'jim' return record if __name__ == '__main__': logger = logging.getLogger() logger.setLevel(logging.INFO) handler = logging.StreamHandler() formatter = logging.Formatter('%(message)s from %(user)-8s') handler.setFormatter(formatter) handler.addFilter(filter) logger.addHandler(handler) logger.info('A log message') Logging to a single file from multiple processes ================================================ Although logging is thread-safe, and logging to a single file from multiple threads in a single process *is* supported, logging to a single file from *multiple processes* is *not* supported, because there is no standard way to serialize access to a single file across multiple processes in Python. If you need to log to a single file from multiple processes, one way of doing this is to have all the processes log to a "SocketHandler", and have a separate process which implements a socket server which reads from the socket and logs to file. (If you prefer, you can dedicate one thread in one of the existing processes to perform this function.) This section documents this approach in more detail and includes a working socket receiver which can be used as a starting point for you to adapt in your own applications. You could also write your own handler which uses the "Lock" class from the "multiprocessing" module to serialize access to the file from your processes. The stdlib "FileHandler" and subclasses do not make use of "multiprocessing". Alternatively, you can use a "Queue" and a "QueueHandler" to send all logging events to one of the processes in your multi-process application. The following example script demonstrates how you can do this; in the example a separate listener process listens for events sent by other processes and logs them according to its own logging configuration. Although the example only demonstrates one way of doing it (for example, you may want to use a listener thread rather than a separate listener process – the implementation would be analogous) it does allow for completely different logging configurations for the listener and the other processes in your application, and can be used as the basis for code meeting your own specific requirements: # You'll need these imports in your own code import logging import logging.handlers import multiprocessing # Next two import lines for this demo only from random import choice, random import time # # Because you'll want to define the logging configurations for listener and workers, the # listener and worker process functions take a configurer parameter which is a callable # for configuring logging for that process. These functions are also passed the queue, # which they use for communication. # # In practice, you can configure the listener however you want, but note that in this # simple example, the listener does not apply level or filter logic to received records. # In practice, you would probably want to do this logic in the worker processes, to avoid # sending events which would be filtered out between processes. # # The size of the rotated files is made small so you can see the results easily. def listener_configurer(): root = logging.getLogger() h = logging.handlers.RotatingFileHandler('mptest.log', 'a', 300, 10) f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s %(message)s') h.setFormatter(f) root.addHandler(h) # This is the listener process top-level loop: wait for logging events # (LogRecords)on the queue and handle them, quit when you get a None for a # LogRecord. def listener_process(queue, configurer): configurer() while True: try: record = queue.get() if record is None: # We send this as a sentinel to tell the listener to quit. break logger = logging.getLogger(record.name) logger.handle(record) # No level or filter logic applied - just do it! except Exception: import sys, traceback print('Whoops! Problem:', file=sys.stderr) traceback.print_exc(file=sys.stderr) # Arrays used for random selections in this demo LEVELS = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL] LOGGERS = ['a.b.c', 'd.e.f'] MESSAGES = [ 'Random message #1', 'Random message #2', 'Random message #3', ] # The worker configuration is done at the start of the worker process run. # Note that on Windows you can't rely on fork semantics, so each process # will run the logging configuration code when it starts. def worker_configurer(queue): h = logging.handlers.QueueHandler(queue) # Just the one handler needed root = logging.getLogger() root.addHandler(h) # send all messages, for demo; no other level or filter logic applied. root.setLevel(logging.DEBUG) # This is the worker process top-level loop, which just logs ten events with # random intervening delays before terminating. # The print messages are just so you know it's doing something! def worker_process(queue, configurer): configurer(queue) name = multiprocessing.current_process().name print('Worker started: %s' % name) for i in range(10): time.sleep(random()) logger = logging.getLogger(choice(LOGGERS)) level = choice(LEVELS) message = choice(MESSAGES) logger.log(level, message) print('Worker finished: %s' % name) # Here's where the demo gets orchestrated. Create the queue, create and start # the listener, create ten workers and start them, wait for them to finish, # then send a None to the queue to tell the listener to finish. def main(): queue = multiprocessing.Queue(-1) listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer)) listener.start() workers = [] for i in range(10): worker = multiprocessing.Process(target=worker_process, args=(queue, worker_configurer)) workers.append(worker) worker.start() for w in workers: w.join() queue.put_nowait(None) listener.join() if __name__ == '__main__': main() A variant of the above script keeps the logging in the main process, in a separate thread: import logging import logging.config import logging.handlers from multiprocessing import Process, Queue import random import threading import time def logger_thread(q): while True: record = q.get() if record is None: break logger = logging.getLogger(record.name) logger.handle(record) def worker_process(q): qh = logging.handlers.QueueHandler(q) root = logging.getLogger() root.setLevel(logging.DEBUG) root.addHandler(qh) levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL] loggers = ['foo', 'foo.bar', 'foo.bar.baz', 'spam', 'spam.ham', 'spam.ham.eggs'] for i in range(100): lvl = random.choice(levels) logger = logging.getLogger(random.choice(loggers)) logger.log(lvl, 'Message no. %d', i) if __name__ == '__main__': q = Queue() d = { 'version': 1, 'formatters': { 'detailed': { 'class': 'logging.Formatter', 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s' } }, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'level': 'INFO', }, 'file': { 'class': 'logging.FileHandler', 'filename': 'mplog.log', 'mode': 'w', 'formatter': 'detailed', }, 'foofile': { 'class': 'logging.FileHandler', 'filename': 'mplog-foo.log', 'mode': 'w', 'formatter': 'detailed', }, 'errors': { 'class': 'logging.FileHandler', 'filename': 'mplog-errors.log', 'mode': 'w', 'level': 'ERROR', 'formatter': 'detailed', }, }, 'loggers': { 'foo': { 'handlers': ['foofile'] } }, 'root': { 'level': 'DEBUG', 'handlers': ['console', 'file', 'errors'] }, } workers = [] for i in range(5): wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(q,)) workers.append(wp) wp.start() logging.config.dictConfig(d) lp = threading.Thread(target=logger_thread, args=(q,)) lp.start() # At this point, the main process could do some useful work of its own # Once it's done that, it can wait for the workers to terminate... for wp in workers: wp.join() # And now tell the logging thread to finish up, too q.put(None) lp.join() This variant shows how you can e.g. apply configuration for particular loggers - e.g. the "foo" logger has a special handler which stores all events in the "foo" subsystem in a file "mplog-foo.log". This will be used by the logging machinery in the main process (even though the logging events are generated in the worker processes) to direct the messages to the appropriate destinations. Using concurrent.futures.ProcessPoolExecutor -------------------------------------------- If you want to use "concurrent.futures.ProcessPoolExecutor" to start your worker processes, you need to create the queue slightly differently. Instead of queue = multiprocessing.Queue(-1) you should use queue = multiprocessing.Manager().Queue(-1) # also works with the examples above and you can then replace the worker creation from this: workers = [] for i in range(10): worker = multiprocessing.Process(target=worker_process, args=(queue, worker_configurer)) workers.append(worker) worker.start() for w in workers: w.join() to this (remembering to first import "concurrent.futures"): with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor: for i in range(10): executor.submit(worker_process, queue, worker_configurer) Deploying Web applications using Gunicorn and uWSGI --------------------------------------------------- When deploying Web applications using Gunicorn or uWSGI (or similar), multiple worker processes are created to handle client requests. In such environments, avoid creating file-based handlers directly in your web application. Instead, use a "SocketHandler" to log from the web application to a listener in a separate process. This can be set up using a process management tool such as Supervisor - see Running a logging socket listener in production for more details. Using file rotation =================== Sometimes you want to let a log file grow to a certain size, then open a new file and log to that. You may want to keep a certain number of these files, and when that many files have been created, rotate the files so that the number of files and the size of the files both remain bounded. For this usage pattern, the logging package provides a "RotatingFileHandler": import glob import logging import logging.handlers LOG_FILENAME = 'logging_rotatingfile_example.out' # Set up a specific logger with our desired output level my_logger = logging.getLogger('MyLogger') my_logger.setLevel(logging.DEBUG) # Add the log message handler to the logger handler = logging.handlers.RotatingFileHandler( LOG_FILENAME, maxBytes=20, backupCount=5) my_logger.addHandler(handler) # Log some messages for i in range(20): my_logger.debug('i = %d' % i) # See what files are created logfiles = glob.glob('%s*' % LOG_FILENAME) for filename in logfiles: print(filename) The result should be 6 separate files, each with part of the log history for the application: logging_rotatingfile_example.out logging_rotatingfile_example.out.1 logging_rotatingfile_example.out.2 logging_rotatingfile_example.out.3 logging_rotatingfile_example.out.4 logging_rotatingfile_example.out.5 The most current file is always "logging_rotatingfile_example.out", and each time it reaches the size limit it is renamed with the suffix ".1". Each of the existing backup files is renamed to increment the suffix (".1" becomes ".2", etc.) and the ".6" file is erased. Obviously this example sets the log length much too small as an extreme example. You would want to set *maxBytes* to an appropriate value. Use of alternative formatting styles ==================================== When logging was added to the Python standard library, the only way of formatting messages with variable content was to use the %-formatting method. Since then, Python has gained two new formatting approaches: "string.Template" (added in Python 2.4) and "str.format()" (added in Python 2.6). Logging (as of 3.2) provides improved support for these two additional formatting styles. The "Formatter" class been enhanced to take an additional, optional keyword parameter named "style". This defaults to "'%'", but other possible values are "'{'" and "'$'", which correspond to the other two formatting styles. Backwards compatibility is maintained by default (as you would expect), but by explicitly specifying a style parameter, you get the ability to specify format strings which work with "str.format()" or "string.Template". Here’s an example console session to show the possibilities: >>> import logging >>> root = logging.getLogger() >>> root.setLevel(logging.DEBUG) >>> handler = logging.StreamHandler() >>> bf = logging.Formatter('{asctime} {name} {levelname:8s} {message}', ... style='{') >>> handler.setFormatter(bf) >>> root.addHandler(handler) >>> logger = logging.getLogger('foo.bar') >>> logger.debug('This is a DEBUG message') 2010-10-28 15:11:55,341 foo.bar DEBUG This is a DEBUG message >>> logger.critical('This is a CRITICAL message') 2010-10-28 15:12:11,526 foo.bar CRITICAL This is a CRITICAL message >>> df = logging.Formatter('$asctime $name ${levelname} $message', ... style='$') >>> handler.setFormatter(df) >>> logger.debug('This is a DEBUG message') 2010-10-28 15:13:06,924 foo.bar DEBUG This is a DEBUG message >>> logger.critical('This is a CRITICAL message') 2010-10-28 15:13:11,494 foo.bar CRITICAL This is a CRITICAL message >>> Note that the formatting of logging messages for final output to logs is completely independent of how an individual logging message is constructed. That can still use %-formatting, as shown here: >>> logger.error('This is an%s %s %s', 'other,', 'ERROR,', 'message') 2010-10-28 15:19:29,833 foo.bar ERROR This is another, ERROR, message >>> Logging calls ("logger.debug()", "logger.info()" etc.) only take positional parameters for the actual logging message itself, with keyword parameters used only for determining options for how to handle the actual logging call (e.g. the "exc_info" keyword parameter to indicate that traceback information should be logged, or the "extra" keyword parameter to indicate additional contextual information to be added to the log). So you cannot directly make logging calls using "str.format()" or "string.Template" syntax, because internally the logging package uses %-formatting to merge the format string and the variable arguments. There would be no changing this while preserving backward compatibility, since all logging calls which are out there in existing code will be using %-format strings. There is, however, a way that you can use {}- and $- formatting to construct your individual log messages. Recall that for a message you can use an arbitrary object as a message format string, and that the logging package will call "str()" on that object to get the actual format string. Consider the following two classes: class BraceMessage: def __init__(self, fmt, /, *args, **kwargs): self.fmt = fmt self.args = args self.kwargs = kwargs def __str__(self): return self.fmt.format(*self.args, **self.kwargs) class DollarMessage: def __init__(self, fmt, /, **kwargs): self.fmt = fmt self.kwargs = kwargs def __str__(self): from string import Template return Template(self.fmt).substitute(**self.kwargs) Either of these can be used in place of a format string, to allow {}- or $-formatting to be used to build the actual “message” part which appears in the formatted log output in place of “%(message)s” or “{message}” or “$message”. It’s a little unwieldy to use the class names whenever you want to log something, but it’s quite palatable if you use an alias such as __ (double underscore — not to be confused with _, the single underscore used as a synonym/alias for "gettext.gettext()" or its brethren). The above classes are not included in Python, though they’re easy enough to copy and paste into your own code. They can be used as follows (assuming that they’re declared in a module called "wherever"): >>> from wherever import BraceMessage as __ >>> print(__('Message with {0} {name}', 2, name='placeholders')) Message with 2 placeholders >>> class Point: pass ... >>> p = Point() >>> p.x = 0.5 >>> p.y = 0.5 >>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', ... point=p)) Message with coordinates: (0.50, 0.50) >>> from wherever import DollarMessage as __ >>> print(__('Message with $num $what', num=2, what='placeholders')) Message with 2 placeholders >>> While the above examples use "print()" to show how the formatting works, you would of course use "logger.debug()" or similar to actually log using this approach. One thing to note is that you pay no significant performance penalty with this approach: the actual formatting happens not when you make the logging call, but when (and if) the logged message is actually about to be output to a log by a handler. So the only slightly unusual thing which might trip you up is that the parentheses go around the format string and the arguments, not just the format string. That’s because the __ notation is just syntax sugar for a constructor call to one of the "*XXX*Message" classes. If you prefer, you can use a "LoggerAdapter" to achieve a similar effect to the above, as in the following example: import logging class Message: def __init__(self, fmt, args): self.fmt = fmt self.args = args def __str__(self): return self.fmt.format(*self.args) class StyleAdapter(logging.LoggerAdapter): def log(self, level, msg, /, *args, stacklevel=1, **kwargs): if self.isEnabledFor(level): msg, kwargs = self.process(msg, kwargs) self.logger.log(level, Message(msg, args), **kwargs, stacklevel=stacklevel+1) logger = StyleAdapter(logging.getLogger(__name__)) def main(): logger.debug('Hello, {}', 'world!') if __name__ == '__main__': logging.basicConfig(level=logging.DEBUG) main() The above script should log the message "Hello, world!" when run with Python 3.8 or later. Customizing "LogRecord" ======================= Every logging event is represented by a "LogRecord" instance. When an event is logged and not filtered out by a logger’s level, a "LogRecord" is created, populated with information about the event and then passed to the handlers for that logger (and its ancestors, up to and including the logger where further propagation up the hierarchy is disabled). Before Python 3.2, there were only two places where this creation was done: * "Logger.makeRecord()", which is called in the normal process of logging an event. This invoked "LogRecord" directly to create an instance. * "makeLogRecord()", which is called with a dictionary containing attributes to be added to the LogRecord. This is typically invoked when a suitable dictionary has been received over the network (e.g. in pickle form via a "SocketHandler", or in JSON form via an "HTTPHandler"). This has usually meant that if you need to do anything special with a "LogRecord", you’ve had to do one of the following. * Create your own "Logger" subclass, which overrides "Logger.makeRecord()", and set it using "setLoggerClass()" before any loggers that you care about are instantiated. * Add a "Filter" to a logger or handler, which does the necessary special manipulation you need when its "filter()" method is called. The first approach would be a little unwieldy in the scenario where (say) several different libraries wanted to do different things. Each would attempt to set its own "Logger" subclass, and the one which did this last would win. The second approach works reasonably well for many cases, but does not allow you to e.g. use a specialized subclass of "LogRecord". Library developers can set a suitable filter on their loggers, but they would have to remember to do this every time they introduced a new logger (which they would do simply by adding new packages or modules and doing logger = logging.getLogger(__name__) at module level). It’s probably one too many things to think about. Developers could also add the filter to a "NullHandler" attached to their top-level logger, but this would not be invoked if an application developer attached a handler to a lower-level library logger — so output from that handler would not reflect the intentions of the library developer. In Python 3.2 and later, "LogRecord" creation is done through a factory, which you can specify. The factory is just a callable you can set with "setLogRecordFactory()", and interrogate with "getLogRecordFactory()". The factory is invoked with the same signature as the "LogRecord" constructor, as "LogRecord" is the default setting for the factory. This approach allows a custom factory to control all aspects of LogRecord creation. For example, you could return a subclass, or just add some additional attributes to the record once created, using a pattern similar to this: old_factory = logging.getLogRecordFactory() def record_factory(*args, **kwargs): record = old_factory(*args, **kwargs) record.custom_attribute = 0xdecafbad return record logging.setLogRecordFactory(record_factory) This pattern allows different libraries to chain factories together, and as long as they don’t overwrite each other’s attributes or unintentionally overwrite the attributes provided as standard, there should be no surprises. However, it should be borne in mind that each link in the chain adds run-time overhead to all logging operations, and the technique should only be used when the use of a "Filter" does not provide the desired result. Subclassing QueueHandler and QueueListener- a ZeroMQ example ============================================================ Subclass "QueueHandler" ----------------------- You can use a "QueueHandler" subclass to send messages to other kinds of queues, for example a ZeroMQ ‘publish’ socket. In the example below,the socket is created separately and passed to the handler (as its ‘queue’): import zmq # using pyzmq, the Python binding for ZeroMQ import json # for serializing records portably ctx = zmq.Context() sock = zmq.Socket(ctx, zmq.PUB) # or zmq.PUSH, or other suitable value sock.bind('tcp://*:5556') # or wherever class ZeroMQSocketHandler(QueueHandler): def enqueue(self, record): self.queue.send_json(record.__dict__) handler = ZeroMQSocketHandler(sock) Of course there are other ways of organizing this, for example passing in the data needed by the handler to create the socket: class ZeroMQSocketHandler(QueueHandler): def __init__(self, uri, socktype=zmq.PUB, ctx=None): self.ctx = ctx or zmq.Context() socket = zmq.Socket(self.ctx, socktype) socket.bind(uri) super().__init__(socket) def enqueue(self, record): self.queue.send_json(record.__dict__) def close(self): self.queue.close() Subclass "QueueListener" ------------------------ You can also subclass "QueueListener" to get messages from other kinds of queues, for example a ZeroMQ ‘subscribe’ socket. Here’s an example: class ZeroMQSocketListener(QueueListener): def __init__(self, uri, /, *handlers, **kwargs): self.ctx = kwargs.get('ctx') or zmq.Context() socket = zmq.Socket(self.ctx, zmq.SUB) socket.setsockopt_string(zmq.SUBSCRIBE, '') # subscribe to everything socket.connect(uri) super().__init__(socket, *handlers, **kwargs) def dequeue(self): msg = self.queue.recv_json() return logging.makeLogRecord(msg) Subclassing QueueHandler and QueueListener- a "pynng" example ============================================================= In a similar way to the above section, we can implement a listener and handler using pynng, which is a Python binding to NNG, billed as a spiritual successor to ZeroMQ. The following snippets illustrate – you can test them in an environment which has "pynng" installed. Just for variety, we present the listener first. Subclass "QueueListener" ------------------------ # listener.py import json import logging import logging.handlers import pynng DEFAULT_ADDR = "tcp://localhost:13232" interrupted = False class NNGSocketListener(logging.handlers.QueueListener): def __init__(self, uri, /, *handlers, **kwargs): # Have a timeout for interruptability, and open a # subscriber socket socket = pynng.Sub0(listen=uri, recv_timeout=500) # The b'' subscription matches all topics topics = kwargs.pop('topics', None) or b'' socket.subscribe(topics) # We treat the socket as a queue super().__init__(socket, *handlers, **kwargs) def dequeue(self, block): data = None # Keep looping while not interrupted and no data received over the # socket while not interrupted: try: data = self.queue.recv(block=block) break except pynng.Timeout: pass except pynng.Closed: # sometimes happens when you hit Ctrl-C break if data is None: return None # Get the logging event sent from a publisher event = json.loads(data.decode('utf-8')) return logging.makeLogRecord(event) def enqueue_sentinel(self): # Not used in this implementation, as the socket isn't really a # queue pass logging.getLogger('pynng').propagate = False listener = NNGSocketListener(DEFAULT_ADDR, logging.StreamHandler(), topics=b'') listener.start() print('Press Ctrl-C to stop.') try: while True: pass except KeyboardInterrupt: interrupted = True finally: listener.stop() Subclass "QueueHandler" ----------------------- # sender.py import json import logging import logging.handlers import time import random import pynng DEFAULT_ADDR = "tcp://localhost:13232" class NNGSocketHandler(logging.handlers.QueueHandler): def __init__(self, uri): socket = pynng.Pub0(dial=uri, send_timeout=500) super().__init__(socket) def enqueue(self, record): # Send the record as UTF-8 encoded JSON d = dict(record.__dict__) data = json.dumps(d) self.queue.send(data.encode('utf-8')) def close(self): self.queue.close() logging.getLogger('pynng').propagate = False handler = NNGSocketHandler(DEFAULT_ADDR) # Make sure the process ID is in the output logging.basicConfig(level=logging.DEBUG, handlers=[logging.StreamHandler(), handler], format='%(levelname)-8s %(name)10s %(process)6s %(message)s') levels = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL) logger_names = ('myapp', 'myapp.lib1', 'myapp.lib2') msgno = 1 while True: # Just randomly select some loggers and levels and log away level = random.choice(levels) logger = logging.getLogger(random.choice(logger_names)) logger.log(level, 'Message no. %5d' % msgno) msgno += 1 delay = random.random() * 2 + 0.5 time.sleep(delay) You can run the above two snippets in separate command shells. If we run the listener in one shell and run the sender in two separate shells, we should see something like the following. In the first sender shell: $ python sender.py DEBUG myapp 613 Message no. 1 WARNING myapp.lib2 613 Message no. 2 CRITICAL myapp.lib2 613 Message no. 3 WARNING myapp.lib2 613 Message no. 4 CRITICAL myapp.lib1 613 Message no. 5 DEBUG myapp 613 Message no. 6 CRITICAL myapp.lib1 613 Message no. 7 INFO myapp.lib1 613 Message no. 8 (and so on) In the second sender shell: $ python sender.py INFO myapp.lib2 657 Message no. 1 CRITICAL myapp.lib2 657 Message no. 2 CRITICAL myapp 657 Message no. 3 CRITICAL myapp.lib1 657 Message no. 4 INFO myapp.lib1 657 Message no. 5 WARNING myapp.lib2 657 Message no. 6 CRITICAL myapp 657 Message no. 7 DEBUG myapp.lib1 657 Message no. 8 (and so on) In the listener shell: $ python listener.py Press Ctrl-C to stop. DEBUG myapp 613 Message no. 1 WARNING myapp.lib2 613 Message no. 2 INFO myapp.lib2 657 Message no. 1 CRITICAL myapp.lib2 613 Message no. 3 CRITICAL myapp.lib2 657 Message no. 2 CRITICAL myapp 657 Message no. 3 WARNING myapp.lib2 613 Message no. 4 CRITICAL myapp.lib1 613 Message no. 5 CRITICAL myapp.lib1 657 Message no. 4 INFO myapp.lib1 657 Message no. 5 DEBUG myapp 613 Message no. 6 WARNING myapp.lib2 657 Message no. 6 CRITICAL myapp 657 Message no. 7 CRITICAL myapp.lib1 613 Message no. 7 INFO myapp.lib1 613 Message no. 8 DEBUG myapp.lib1 657 Message no. 8 (and so on) As you can see, the logging from the two sender processes is interleaved in the listener’s output. An example dictionary-based configuration ========================================= Below is an example of a logging configuration dictionary - it’s taken from the documentation on the Django project. This dictionary is passed to "dictConfig()" to put the configuration into effect: LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'verbose': { 'format': '{levelname} {asctime} {module} {process:d} {thread:d} {message}', 'style': '{', }, 'simple': { 'format': '{levelname} {message}', 'style': '{', }, }, 'filters': { 'special': { '()': 'project.logging.SpecialFilter', 'foo': 'bar', }, }, 'handlers': { 'console': { 'level': 'INFO', 'class': 'logging.StreamHandler', 'formatter': 'simple', }, 'mail_admins': { 'level': 'ERROR', 'class': 'django.utils.log.AdminEmailHandler', 'filters': ['special'] } }, 'loggers': { 'django': { 'handlers': ['console'], 'propagate': True, }, 'django.request': { 'handlers': ['mail_admins'], 'level': 'ERROR', 'propagate': False, }, 'myproject.custom': { 'handlers': ['console', 'mail_admins'], 'level': 'INFO', 'filters': ['special'] } } } For more information about this configuration, you can see the relevant section of the Django documentation. Using a rotator and namer to customize log rotation processing ============================================================== An example of how you can define a namer and rotator is given in the following runnable script, which shows gzip compression of the log file: import gzip import logging import logging.handlers import os import shutil def namer(name): return name + ".gz" def rotator(source, dest): with open(source, 'rb') as f_in: with gzip.open(dest, 'wb') as f_out: shutil.copyfileobj(f_in, f_out) os.remove(source) rh = logging.handlers.RotatingFileHandler('rotated.log', maxBytes=128, backupCount=5) rh.rotator = rotator rh.namer = namer root = logging.getLogger() root.setLevel(logging.INFO) root.addHandler(rh) f = logging.Formatter('%(asctime)s %(message)s') rh.setFormatter(f) for i in range(1000): root.info(f'Message no. {i + 1}') After running this, you will see six new files, five of which are compressed: $ ls rotated.log* rotated.log rotated.log.2.gz rotated.log.4.gz rotated.log.1.gz rotated.log.3.gz rotated.log.5.gz $ zcat rotated.log.1.gz 2023-01-20 02:28:17,767 Message no. 996 2023-01-20 02:28:17,767 Message no. 997 2023-01-20 02:28:17,767 Message no. 998 A more elaborate multiprocessing example ======================================== The following working example shows how logging can be used with multiprocessing using configuration files. The configurations are fairly simple, but serve to illustrate how more complex ones could be implemented in a real multiprocessing scenario. In the example, the main process spawns a listener process and some worker processes. Each of the main process, the listener and the workers have three separate configurations (the workers all share the same configuration). We can see logging in the main process, how the workers log to a QueueHandler and how the listener implements a QueueListener and a more complex logging configuration, and arranges to dispatch events received via the queue to the handlers specified in the configuration. Note that these configurations are purely illustrative, but you should be able to adapt this example to your own scenario. Here’s the script - the docstrings and the comments hopefully explain how it works: import logging import logging.config import logging.handlers from multiprocessing import Process, Queue, Event, current_process import os import random import time class MyHandler: """ A simple handler for logging events. It runs in the listener process and dispatches events to loggers based on the name in the received record, which then get dispatched, by the logging system, to the handlers configured for those loggers. """ def handle(self, record): if record.name == "root": logger = logging.getLogger() else: logger = logging.getLogger(record.name) if logger.isEnabledFor(record.levelno): # The process name is transformed just to show that it's the listener # doing the logging to files and console record.processName = '%s (for %s)' % (current_process().name, record.processName) logger.handle(record) def listener_process(q, stop_event, config): """ This could be done in the main process, but is just done in a separate process for illustrative purposes. This initialises logging according to the specified configuration, starts the listener and waits for the main process to signal completion via the event. The listener is then stopped, and the process exits. """ logging.config.dictConfig(config) listener = logging.handlers.QueueListener(q, MyHandler()) listener.start() if os.name == 'posix': # On POSIX, the setup logger will have been configured in the # parent process, but should have been disabled following the # dictConfig call. # On Windows, since fork isn't used, the setup logger won't # exist in the child, so it would be created and the message # would appear - hence the "if posix" clause. logger = logging.getLogger('setup') logger.critical('Should not appear, because of disabled logger ...') stop_event.wait() listener.stop() def worker_process(config): """ A number of these are spawned for the purpose of illustration. In practice, they could be a heterogeneous bunch of processes rather than ones which are identical to each other. This initialises logging according to the specified configuration, and logs a hundred messages with random levels to randomly selected loggers. A small sleep is added to allow other processes a chance to run. This is not strictly needed, but it mixes the output from the different processes a bit more than if it's left out. """ logging.config.dictConfig(config) levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL] loggers = ['foo', 'foo.bar', 'foo.bar.baz', 'spam', 'spam.ham', 'spam.ham.eggs'] if os.name == 'posix': # On POSIX, the setup logger will have been configured in the # parent process, but should have been disabled following the # dictConfig call. # On Windows, since fork isn't used, the setup logger won't # exist in the child, so it would be created and the message # would appear - hence the "if posix" clause. logger = logging.getLogger('setup') logger.critical('Should not appear, because of disabled logger ...') for i in range(100): lvl = random.choice(levels) logger = logging.getLogger(random.choice(loggers)) logger.log(lvl, 'Message no. %d', i) time.sleep(0.01) def main(): q = Queue() # The main process gets a simple configuration which prints to the console. config_initial = { 'version': 1, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'level': 'INFO' } }, 'root': { 'handlers': ['console'], 'level': 'DEBUG' } } # The worker process configuration is just a QueueHandler attached to the # root logger, which allows all messages to be sent to the queue. # We disable existing loggers to disable the "setup" logger used in the # parent process. This is needed on POSIX because the logger will # be there in the child following a fork(). config_worker = { 'version': 1, 'disable_existing_loggers': True, 'handlers': { 'queue': { 'class': 'logging.handlers.QueueHandler', 'queue': q } }, 'root': { 'handlers': ['queue'], 'level': 'DEBUG' } } # The listener process configuration shows that the full flexibility of # logging configuration is available to dispatch events to handlers however # you want. # We disable existing loggers to disable the "setup" logger used in the # parent process. This is needed on POSIX because the logger will # be there in the child following a fork(). config_listener = { 'version': 1, 'disable_existing_loggers': True, 'formatters': { 'detailed': { 'class': 'logging.Formatter', 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s' }, 'simple': { 'class': 'logging.Formatter', 'format': '%(name)-15s %(levelname)-8s %(processName)-10s %(message)s' } }, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'formatter': 'simple', 'level': 'INFO' }, 'file': { 'class': 'logging.FileHandler', 'filename': 'mplog.log', 'mode': 'w', 'formatter': 'detailed' }, 'foofile': { 'class': 'logging.FileHandler', 'filename': 'mplog-foo.log', 'mode': 'w', 'formatter': 'detailed' }, 'errors': { 'class': 'logging.FileHandler', 'filename': 'mplog-errors.log', 'mode': 'w', 'formatter': 'detailed', 'level': 'ERROR' } }, 'loggers': { 'foo': { 'handlers': ['foofile'] } }, 'root': { 'handlers': ['console', 'file', 'errors'], 'level': 'DEBUG' } } # Log some initial events, just to show that logging in the parent works # normally. logging.config.dictConfig(config_initial) logger = logging.getLogger('setup') logger.info('About to create workers ...') workers = [] for i in range(5): wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(config_worker,)) workers.append(wp) wp.start() logger.info('Started worker: %s', wp.name) logger.info('About to create listener ...') stop_event = Event() lp = Process(target=listener_process, name='listener', args=(q, stop_event, config_listener)) lp.start() logger.info('Started listener') # We now hang around for the workers to finish their work. for wp in workers: wp.join() # Workers all done, listening can now stop. # Logging in the parent still works normally. logger.info('Telling listener to stop ...') stop_event.set() lp.join() logger.info('All done.') if __name__ == '__main__': main() Inserting a BOM into messages sent to a SysLogHandler ===================================================== **RFC 5424** requires that a Unicode message be sent to a syslog daemon as a set of bytes which have the following structure: an optional pure-ASCII component, followed by a UTF-8 Byte Order Mark (BOM), followed by Unicode encoded using UTF-8. (See the **relevant section of the specification**.) In Python 3.1, code was added to "SysLogHandler" to insert a BOM into the message, but unfortunately, it was implemented incorrectly, with the BOM appearing at the beginning of the message and hence not allowing any pure-ASCII component to appear before it. As this behaviour is broken, the incorrect BOM insertion code is being removed from Python 3.2.4 and later. However, it is not being replaced, and if you want to produce **RFC 5424**-compliant messages which include a BOM, an optional pure-ASCII sequence before it and arbitrary Unicode after it, encoded using UTF-8, then you need to do the following: 1. Attach a "Formatter" instance to your "SysLogHandler" instance, with a format string such as: 'ASCII section\ufeffUnicode section' The Unicode code point U+FEFF, when encoded using UTF-8, will be encoded as a UTF-8 BOM – the byte-string "b'\xef\xbb\xbf'". 2. Replace the ASCII section with whatever placeholders you like, but make sure that the data that appears in there after substitution is always ASCII (that way, it will remain unchanged after UTF-8 encoding). 3. Replace the Unicode section with whatever placeholders you like; if the data which appears there after substitution contains characters outside the ASCII range, that’s fine – it will be encoded using UTF-8. The formatted message *will* be encoded using UTF-8 encoding by "SysLogHandler". If you follow the above rules, you should be able to produce **RFC 5424**-compliant messages. If you don’t, logging may not complain, but your messages will not be RFC 5424-compliant, and your syslog daemon may complain. Implementing structured logging =============================== Although most logging messages are intended for reading by humans, and thus not readily machine-parseable, there might be circumstances where you want to output messages in a structured format which *is* capable of being parsed by a program (without needing complex regular expressions to parse the log message). This is straightforward to achieve using the logging package. There are a number of ways in which this could be achieved, but the following is a simple approach which uses JSON to serialise the event in a machine-parseable manner: import json import logging class StructuredMessage: def __init__(self, message, /, **kwargs): self.message = message self.kwargs = kwargs def __str__(self): return '%s >>> %s' % (self.message, json.dumps(self.kwargs)) _ = StructuredMessage # optional, to improve readability logging.basicConfig(level=logging.INFO, format='%(message)s') logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456)) If the above script is run, it prints: message 1 >>> {"fnum": 123.456, "num": 123, "bar": "baz", "foo": "bar"} Note that the order of items might be different according to the version of Python used. If you need more specialised processing, you can use a custom JSON encoder, as in the following complete example: import json import logging class Encoder(json.JSONEncoder): def default(self, o): if isinstance(o, set): return tuple(o) elif isinstance(o, str): return o.encode('unicode_escape').decode('ascii') return super().default(o) class StructuredMessage: def __init__(self, message, /, **kwargs): self.message = message self.kwargs = kwargs def __str__(self): s = Encoder().encode(self.kwargs) return '%s >>> %s' % (self.message, s) _ = StructuredMessage # optional, to improve readability def main(): logging.basicConfig(level=logging.INFO, format='%(message)s') logging.info(_('message 1', set_value={1, 2, 3}, snowman='\u2603')) if __name__ == '__main__': main() When the above script is run, it prints: message 1 >>> {"snowman": "\u2603", "set_value": [1, 2, 3]} Note that the order of items might be different according to the version of Python used. Customizing handlers with "dictConfig()" ======================================== There are times when you want to customize logging handlers in particular ways, and if you use "dictConfig()" you may be able to do this without subclassing. As an example, consider that you may want to set the ownership of a log file. On POSIX, this is easily done using "shutil.chown()", but the file handlers in the stdlib don’t offer built-in support. You can customize handler creation using a plain function such as: def owned_file_handler(filename, mode='a', encoding=None, owner=None): if owner: if not os.path.exists(filename): open(filename, 'a').close() shutil.chown(filename, *owner) return logging.FileHandler(filename, mode, encoding) You can then specify, in a logging configuration passed to "dictConfig()", that a logging handler be created by calling this function: LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'default': { 'format': '%(asctime)s %(levelname)s %(name)s %(message)s' }, }, 'handlers': { 'file':{ # The values below are popped from this dictionary and # used to create the handler, set the handler's level and # its formatter. '()': owned_file_handler, 'level':'DEBUG', 'formatter': 'default', # The values below are passed to the handler creator callable # as keyword arguments. 'owner': ['pulse', 'pulse'], 'filename': 'chowntest.log', 'mode': 'w', 'encoding': 'utf-8', }, }, 'root': { 'handlers': ['file'], 'level': 'DEBUG', }, } In this example I am setting the ownership using the "pulse" user and group, just for the purposes of illustration. Putting it together into a working script, "chowntest.py": import logging, logging.config, os, shutil def owned_file_handler(filename, mode='a', encoding=None, owner=None): if owner: if not os.path.exists(filename): open(filename, 'a').close() shutil.chown(filename, *owner) return logging.FileHandler(filename, mode, encoding) LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'default': { 'format': '%(asctime)s %(levelname)s %(name)s %(message)s' }, }, 'handlers': { 'file':{ # The values below are popped from this dictionary and # used to create the handler, set the handler's level and # its formatter. '()': owned_file_handler, 'level':'DEBUG', 'formatter': 'default', # The values below are passed to the handler creator callable # as keyword arguments. 'owner': ['pulse', 'pulse'], 'filename': 'chowntest.log', 'mode': 'w', 'encoding': 'utf-8', }, }, 'root': { 'handlers': ['file'], 'level': 'DEBUG', }, } logging.config.dictConfig(LOGGING) logger = logging.getLogger('mylogger') logger.debug('A debug message') To run this, you will probably need to run as "root": $ sudo python3.3 chowntest.py $ cat chowntest.log 2013-11-05 09:34:51,128 DEBUG mylogger A debug message $ ls -l chowntest.log -rw-r--r-- 1 pulse pulse 55 2013-11-05 09:34 chowntest.log Note that this example uses Python 3.3 because that’s where "shutil.chown()" makes an appearance. This approach should work with any Python version that supports "dictConfig()" - namely, Python 2.7, 3.2 or later. With pre-3.3 versions, you would need to implement the actual ownership change using e.g. "os.chown()". In practice, the handler-creating function may be in a utility module somewhere in your project. Instead of the line in the configuration: '()': owned_file_handler, you could use e.g.: '()': 'ext://project.util.owned_file_handler', where "project.util" can be replaced with the actual name of the package where the function resides. In the above working script, using "'ext://__main__.owned_file_handler'" should work. Here, the actual callable is resolved by "dictConfig()" from the "ext://" specification. This example hopefully also points the way to how you could implement other types of file change - e.g. setting specific POSIX permission bits - in the same way, using "os.chmod()". Of course, the approach could also be extended to types of handler other than a "FileHandler" - for example, one of the rotating file handlers, or a different type of handler altogether. Using particular formatting styles throughout your application ============================================================== In Python 3.2, the "Formatter" gained a "style" keyword parameter which, while defaulting to "%" for backward compatibility, allowed the specification of "{" or "$" to support the formatting approaches supported by "str.format()" and "string.Template". Note that this governs the formatting of logging messages for final output to logs, and is completely orthogonal to how an individual logging message is constructed. Logging calls ("debug()", "info()" etc.) only take positional parameters for the actual logging message itself, with keyword parameters used only for determining options for how to handle the logging call (e.g. the "exc_info" keyword parameter to indicate that traceback information should be logged, or the "extra" keyword parameter to indicate additional contextual information to be added to the log). So you cannot directly make logging calls using "str.format()" or "string.Template" syntax, because internally the logging package uses %-formatting to merge the format string and the variable arguments. There would be no changing this while preserving backward compatibility, since all logging calls which are out there in existing code will be using %-format strings. There have been suggestions to associate format styles with specific loggers, but that approach also runs into backward compatibility problems because any existing code could be using a given logger name and using %-formatting. For logging to work interoperably between any third-party libraries and your code, decisions about formatting need to be made at the level of the individual logging call. This opens up a couple of ways in which alternative formatting styles can be accommodated. Using LogRecord factories ------------------------- In Python 3.2, along with the "Formatter" changes mentioned above, the logging package gained the ability to allow users to set their own "LogRecord" subclasses, using the "setLogRecordFactory()" function. You can use this to set your own subclass of "LogRecord", which does the Right Thing by overriding the "getMessage()" method. The base class implementation of this method is where the "msg % args" formatting happens, and where you can substitute your alternate formatting; however, you should be careful to support all formatting styles and allow %-formatting as the default, to ensure interoperability with other code. Care should also be taken to call "str(self.msg)", just as the base implementation does. Refer to the reference documentation on "setLogRecordFactory()" and "LogRecord" for more information. Using custom message objects ---------------------------- There is another, perhaps simpler way that you can use {}- and $- formatting to construct your individual log messages. You may recall (from Using arbitrary objects as messages) that when logging you can use an arbitrary object as a message format string, and that the logging package will call "str()" on that object to get the actual format string. Consider the following two classes: class BraceMessage: def __init__(self, fmt, /, *args, **kwargs): self.fmt = fmt self.args = args self.kwargs = kwargs def __str__(self): return self.fmt.format(*self.args, **self.kwargs) class DollarMessage: def __init__(self, fmt, /, **kwargs): self.fmt = fmt self.kwargs = kwargs def __str__(self): from string import Template return Template(self.fmt).substitute(**self.kwargs) Either of these can be used in place of a format string, to allow {}- or $-formatting to be used to build the actual “message” part which appears in the formatted log output in place of “%(message)s” or “{message}” or “$message”. If you find it a little unwieldy to use the class names whenever you want to log something, you can make it more palatable if you use an alias such as "M" or "_" for the message (or perhaps "__", if you are using "_" for localization). Examples of this approach are given below. Firstly, formatting with "str.format()": >>> __ = BraceMessage >>> print(__('Message with {0} {1}', 2, 'placeholders')) Message with 2 placeholders >>> class Point: pass ... >>> p = Point() >>> p.x = 0.5 >>> p.y = 0.5 >>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', point=p)) Message with coordinates: (0.50, 0.50) Secondly, formatting with "string.Template": >>> __ = DollarMessage >>> print(__('Message with $num $what', num=2, what='placeholders')) Message with 2 placeholders >>> One thing to note is that you pay no significant performance penalty with this approach: the actual formatting happens not when you make the logging call, but when (and if) the logged message is actually about to be output to a log by a handler. So the only slightly unusual thing which might trip you up is that the parentheses go around the format string and the arguments, not just the format string. That’s because the __ notation is just syntax sugar for a constructor call to one of the "*XXX*Message" classes shown above. Configuring filters with "dictConfig()" ======================================= You *can* configure filters using "dictConfig()", though it might not be obvious at first glance how to do it (hence this recipe). Since "Filter" is the only filter class included in the standard library, and it is unlikely to cater to many requirements (it’s only there as a base class), you will typically need to define your own "Filter" subclass with an overridden "filter()" method. To do this, specify the "()" key in the configuration dictionary for the filter, specifying a callable which will be used to create the filter (a class is the most obvious, but you can provide any callable which returns a "Filter" instance). Here is a complete example: import logging import logging.config import sys class MyFilter(logging.Filter): def __init__(self, param=None): self.param = param def filter(self, record): if self.param is None: allow = True else: allow = self.param not in record.msg if allow: record.msg = 'changed: ' + record.msg return allow LOGGING = { 'version': 1, 'filters': { 'myfilter': { '()': MyFilter, 'param': 'noshow', } }, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'filters': ['myfilter'] } }, 'root': { 'level': 'DEBUG', 'handlers': ['console'] }, } if __name__ == '__main__': logging.config.dictConfig(LOGGING) logging.debug('hello') logging.debug('hello - noshow') This example shows how you can pass configuration data to the callable which constructs the instance, in the form of keyword parameters. When run, the above script will print: changed: hello which shows that the filter is working as configured. A couple of extra points to note: * If you can’t refer to the callable directly in the configuration (e.g. if it lives in a different module, and you can’t import it directly where the configuration dictionary is), you can use the form "ext://..." as described in Access to external objects. For example, you could have used the text "'ext://__main__.MyFilter'" instead of "MyFilter" in the above example. * As well as for filters, this technique can also be used to configure custom handlers and formatters. See User-defined objects for more information on how logging supports using user-defined objects in its configuration, and see the other cookbook recipe Customizing handlers with dictConfig() above. Customized exception formatting =============================== There might be times when you want to do customized exception formatting - for argument’s sake, let’s say you want exactly one line per logged event, even when exception information is present. You can do this with a custom formatter class, as shown in the following example: import logging class OneLineExceptionFormatter(logging.Formatter): def formatException(self, exc_info): """ Format an exception so that it prints on a single line. """ result = super().formatException(exc_info) return repr(result) # or format into one line however you want to def format(self, record): s = super().format(record) if record.exc_text: s = s.replace('\n', '') + '|' return s def configure_logging(): fh = logging.FileHandler('output.txt', 'w') f = OneLineExceptionFormatter('%(asctime)s|%(levelname)s|%(message)s|', '%d/%m/%Y %H:%M:%S') fh.setFormatter(f) root = logging.getLogger() root.setLevel(logging.DEBUG) root.addHandler(fh) def main(): configure_logging() logging.info('Sample message') try: x = 1 / 0 except ZeroDivisionError as e: logging.exception('ZeroDivisionError: %s', e) if __name__ == '__main__': main() When run, this produces a file with exactly two lines: 28/01/2015 07:21:23|INFO|Sample message| 28/01/2015 07:21:23|ERROR|ZeroDivisionError: integer division or modulo by zero|'Traceback (most recent call last):\n File "logtest7.py", line 30, in main\n x = 1 / 0\nZeroDivisionError: integer division or modulo by zero'| While the above treatment is simplistic, it points the way to how exception information can be formatted to your liking. The "traceback" module may be helpful for more specialized needs. Speaking logging messages ========================= There might be situations when it is desirable to have logging messages rendered in an audible rather than a visible format. This is easy to do if you have text-to-speech (TTS) functionality available in your system, even if it doesn’t have a Python binding. Most TTS systems have a command line program you can run, and this can be invoked from a handler using "subprocess". It’s assumed here that TTS command line programs won’t expect to interact with users or take a long time to complete, and that the frequency of logged messages will be not so high as to swamp the user with messages, and that it’s acceptable to have the messages spoken one at a time rather than concurrently, The example implementation below waits for one message to be spoken before the next is processed, and this might cause other handlers to be kept waiting. Here is a short example showing the approach, which assumes that the "espeak" TTS package is available: import logging import subprocess import sys class TTSHandler(logging.Handler): def emit(self, record): msg = self.format(record) # Speak slowly in a female English voice cmd = ['espeak', '-s150', '-ven+f3', msg] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) # wait for the program to finish p.communicate() def configure_logging(): h = TTSHandler() root = logging.getLogger() root.addHandler(h) # the default formatter just returns the message root.setLevel(logging.DEBUG) def main(): logging.info('Hello') logging.debug('Goodbye') if __name__ == '__main__': configure_logging() sys.exit(main()) When run, this script should say “Hello” and then “Goodbye” in a female voice. The above approach can, of course, be adapted to other TTS systems and even other systems altogether which can process messages via external programs run from a command line. Buffering logging messages and outputting them conditionally ============================================================ There might be situations where you want to log messages in a temporary area and only output them if a certain condition occurs. For example, you may want to start logging debug events in a function, and if the function completes without errors, you don’t want to clutter the log with the collected debug information, but if there is an error, you want all the debug information to be output as well as the error. Here is an example which shows how you could do this using a decorator for your functions where you want logging to behave this way. It makes use of the "logging.handlers.MemoryHandler", which allows buffering of logged events until some condition occurs, at which point the buffered events are "flushed" - passed to another handler (the "target" handler) for processing. By default, the "MemoryHandler" flushed when its buffer gets filled up or an event whose level is greater than or equal to a specified threshold is seen. You can use this recipe with a more specialised subclass of "MemoryHandler" if you want custom flushing behavior. The example script has a simple function, "foo", which just cycles through all the logging levels, writing to "sys.stderr" to say what level it’s about to log at, and then actually logging a message at that level. You can pass a parameter to "foo" which, if true, will log at ERROR and CRITICAL levels - otherwise, it only logs at DEBUG, INFO and WARNING levels. The script just arranges to decorate "foo" with a decorator which will do the conditional logging that’s required. The decorator takes a logger as a parameter and attaches a memory handler for the duration of the call to the decorated function. The decorator can be additionally parameterised using a target handler, a level at which flushing should occur, and a capacity for the buffer (number of records buffered). These default to a "StreamHandler" which writes to "sys.stderr", "logging.ERROR" and "100" respectively. Here’s the script: import logging from logging.handlers import MemoryHandler import sys logger = logging.getLogger(__name__) logger.addHandler(logging.NullHandler()) def log_if_errors(logger, target_handler=None, flush_level=None, capacity=None): if target_handler is None: target_handler = logging.StreamHandler() if flush_level is None: flush_level = logging.ERROR if capacity is None: capacity = 100 handler = MemoryHandler(capacity, flushLevel=flush_level, target=target_handler) def decorator(fn): def wrapper(*args, **kwargs): logger.addHandler(handler) try: return fn(*args, **kwargs) except Exception: logger.exception('call failed') raise finally: super(MemoryHandler, handler).flush() logger.removeHandler(handler) return wrapper return decorator def write_line(s): sys.stderr.write('%s\n' % s) def foo(fail=False): write_line('about to log at DEBUG ...') logger.debug('Actually logged at DEBUG') write_line('about to log at INFO ...') logger.info('Actually logged at INFO') write_line('about to log at WARNING ...') logger.warning('Actually logged at WARNING') if fail: write_line('about to log at ERROR ...') logger.error('Actually logged at ERROR') write_line('about to log at CRITICAL ...') logger.critical('Actually logged at CRITICAL') return fail decorated_foo = log_if_errors(logger)(foo) if __name__ == '__main__': logger.setLevel(logging.DEBUG) write_line('Calling undecorated foo with False') assert not foo(False) write_line('Calling undecorated foo with True') assert foo(True) write_line('Calling decorated foo with False') assert not decorated_foo(False) write_line('Calling decorated foo with True') assert decorated_foo(True) When this script is run, the following output should be observed: Calling undecorated foo with False about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... Calling undecorated foo with True about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... about to log at ERROR ... about to log at CRITICAL ... Calling decorated foo with False about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... Calling decorated foo with True about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... about to log at ERROR ... Actually logged at DEBUG Actually logged at INFO Actually logged at WARNING Actually logged at ERROR about to log at CRITICAL ... Actually logged at CRITICAL As you can see, actual logging output only occurs when an event is logged whose severity is ERROR or greater, but in that case, any previous events at lower severities are also logged. You can of course use the conventional means of decoration: @log_if_errors(logger) def foo(fail=False): ... Sending logging messages to email, with buffering ================================================= To illustrate how you can send log messages via email, so that a set number of messages are sent per email, you can subclass "BufferingHandler". In the following example, which you can adapt to suit your specific needs, a simple test harness is provided which allows you to run the script with command line arguments specifying what you typically need to send things via SMTP. (Run the downloaded script with the "-h" argument to see the required and optional arguments.) import logging import logging.handlers import smtplib class BufferingSMTPHandler(logging.handlers.BufferingHandler): def __init__(self, mailhost, port, username, password, fromaddr, toaddrs, subject, capacity): logging.handlers.BufferingHandler.__init__(self, capacity) self.mailhost = mailhost self.mailport = port self.username = username self.password = password self.fromaddr = fromaddr if isinstance(toaddrs, str): toaddrs = [toaddrs] self.toaddrs = toaddrs self.subject = subject self.setFormatter(logging.Formatter("%(asctime)s %(levelname)-5s %(message)s")) def flush(self): if len(self.buffer) > 0: try: smtp = smtplib.SMTP(self.mailhost, self.mailport) smtp.starttls() smtp.login(self.username, self.password) msg = "From: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n" % (self.fromaddr, ','.join(self.toaddrs), self.subject) for record in self.buffer: s = self.format(record) msg = msg + s + "\r\n" smtp.sendmail(self.fromaddr, self.toaddrs, msg) smtp.quit() except Exception: if logging.raiseExceptions: raise self.buffer = [] if __name__ == '__main__': import argparse ap = argparse.ArgumentParser() aa = ap.add_argument aa('host', metavar='HOST', help='SMTP server') aa('--port', '-p', type=int, default=587, help='SMTP port') aa('user', metavar='USER', help='SMTP username') aa('password', metavar='PASSWORD', help='SMTP password') aa('to', metavar='TO', help='Addressee for emails') aa('sender', metavar='SENDER', help='Sender email address') aa('--subject', '-s', default='Test Logging email from Python logging module (buffering)', help='Subject of email') options = ap.parse_args() logger = logging.getLogger() logger.setLevel(logging.DEBUG) h = BufferingSMTPHandler(options.host, options.port, options.user, options.password, options.sender, options.to, options.subject, 10) logger.addHandler(h) for i in range(102): logger.info("Info index = %d", i) h.flush() h.close() If you run this script and your SMTP server is correctly set up, you should find that it sends eleven emails to the addressee you specify. The first ten emails will each have ten log messages, and the eleventh will have two messages. That makes up 102 messages as specified in the script. Formatting times using UTC (GMT) via configuration ================================================== Sometimes you want to format times using UTC, which can be done using a class such as "UTCFormatter", shown below: import logging import time class UTCFormatter(logging.Formatter): converter = time.gmtime and you can then use the "UTCFormatter" in your code instead of "Formatter". If you want to do that via configuration, you can use the "dictConfig()" API with an approach illustrated by the following complete example: import logging import logging.config import time class UTCFormatter(logging.Formatter): converter = time.gmtime LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'utc': { '()': UTCFormatter, 'format': '%(asctime)s %(message)s', }, 'local': { 'format': '%(asctime)s %(message)s', } }, 'handlers': { 'console1': { 'class': 'logging.StreamHandler', 'formatter': 'utc', }, 'console2': { 'class': 'logging.StreamHandler', 'formatter': 'local', }, }, 'root': { 'handlers': ['console1', 'console2'], } } if __name__ == '__main__': logging.config.dictConfig(LOGGING) logging.warning('The local time is %s', time.asctime()) When this script is run, it should print something like: 2015-10-17 12:53:29,501 The local time is Sat Oct 17 13:53:29 2015 2015-10-17 13:53:29,501 The local time is Sat Oct 17 13:53:29 2015 showing how the time is formatted both as local time and UTC, one for each handler. Using a context manager for selective logging ============================================= There are times when it would be useful to temporarily change the logging configuration and revert it back after doing something. For this, a context manager is the most obvious way of saving and restoring the logging context. Here is a simple example of such a context manager, which allows you to optionally change the logging level and add a logging handler purely in the scope of the context manager: import logging import sys class LoggingContext: def __init__(self, logger, level=None, handler=None, close=True): self.logger = logger self.level = level self.handler = handler self.close = close def __enter__(self): if self.level is not None: self.old_level = self.logger.level self.logger.setLevel(self.level) if self.handler: self.logger.addHandler(self.handler) def __exit__(self, et, ev, tb): if self.level is not None: self.logger.setLevel(self.old_level) if self.handler: self.logger.removeHandler(self.handler) if self.handler and self.close: self.handler.close() # implicit return of None => don't swallow exceptions If you specify a level value, the logger’s level is set to that value in the scope of the with block covered by the context manager. If you specify a handler, it is added to the logger on entry to the block and removed on exit from the block. You can also ask the manager to close the handler for you on block exit - you could do this if you don’t need the handler any more. To illustrate how it works, we can add the following block of code to the above: if __name__ == '__main__': logger = logging.getLogger('foo') logger.addHandler(logging.StreamHandler()) logger.setLevel(logging.INFO) logger.info('1. This should appear just once on stderr.') logger.debug('2. This should not appear.') with LoggingContext(logger, level=logging.DEBUG): logger.debug('3. This should appear once on stderr.') logger.debug('4. This should not appear.') h = logging.StreamHandler(sys.stdout) with LoggingContext(logger, level=logging.DEBUG, handler=h, close=True): logger.debug('5. This should appear twice - once on stderr and once on stdout.') logger.info('6. This should appear just once on stderr.') logger.debug('7. This should not appear.') We initially set the logger’s level to "INFO", so message #1 appears and message #2 doesn’t. We then change the level to "DEBUG" temporarily in the following "with" block, and so message #3 appears. After the block exits, the logger’s level is restored to "INFO" and so message #4 doesn’t appear. In the next "with" block, we set the level to "DEBUG" again but also add a handler writing to "sys.stdout". Thus, message #5 appears twice on the console (once via "stderr" and once via "stdout"). After the "with" statement’s completion, the status is as it was before so message #6 appears (like message #1) whereas message #7 doesn’t (just like message #2). If we run the resulting script, the result is as follows: $ python logctx.py 1. This should appear just once on stderr. 3. This should appear once on stderr. 5. This should appear twice - once on stderr and once on stdout. 5. This should appear twice - once on stderr and once on stdout. 6. This should appear just once on stderr. If we run it again, but pipe "stderr" to "/dev/null", we see the following, which is the only message written to "stdout": $ python logctx.py 2>/dev/null 5. This should appear twice - once on stderr and once on stdout. Once again, but piping "stdout" to "/dev/null", we get: $ python logctx.py >/dev/null 1. This should appear just once on stderr. 3. This should appear once on stderr. 5. This should appear twice - once on stderr and once on stdout. 6. This should appear just once on stderr. In this case, the message #5 printed to "stdout" doesn’t appear, as expected. Of course, the approach described here can be generalised, for example to attach logging filters temporarily. Note that the above code works in Python 2 as well as Python 3. A CLI application starter template ================================== Here’s an example which shows how you can: * Use a logging level based on command-line arguments * Dispatch to multiple subcommands in separate files, all logging at the same level in a consistent way * Make use of simple, minimal configuration Suppose we have a command-line application whose job is to stop, start or restart some services. This could be organised for the purposes of illustration as a file "app.py" that is the main script for the application, with individual commands implemented in "start.py", "stop.py" and "restart.py". Suppose further that we want to control the verbosity of the application via a command-line argument, defaulting to "logging.INFO". Here’s one way that "app.py" could be written: import argparse import importlib import logging import os import sys def main(args=None): scriptname = os.path.basename(__file__) parser = argparse.ArgumentParser(scriptname) levels = ('DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL') parser.add_argument('--log-level', default='INFO', choices=levels) subparsers = parser.add_subparsers(dest='command', help='Available commands:') start_cmd = subparsers.add_parser('start', help='Start a service') start_cmd.add_argument('name', metavar='NAME', help='Name of service to start') stop_cmd = subparsers.add_parser('stop', help='Stop one or more services') stop_cmd.add_argument('names', metavar='NAME', nargs='+', help='Name of service to stop') restart_cmd = subparsers.add_parser('restart', help='Restart one or more services') restart_cmd.add_argument('names', metavar='NAME', nargs='+', help='Name of service to restart') options = parser.parse_args() # the code to dispatch commands could all be in this file. For the purposes # of illustration only, we implement each command in a separate module. try: mod = importlib.import_module(options.command) cmd = getattr(mod, 'command') except (ImportError, AttributeError): print('Unable to find the code for command \'%s\'' % options.command) return 1 # Could get fancy here and load configuration from file or dictionary logging.basicConfig(level=options.log_level, format='%(levelname)s %(name)s %(message)s') cmd(options) if __name__ == '__main__': sys.exit(main()) And the "start", "stop" and "restart" commands can be implemented in separate modules, like so for starting: # start.py import logging logger = logging.getLogger(__name__) def command(options): logger.debug('About to start %s', options.name) # actually do the command processing here ... logger.info('Started the \'%s\' service.', options.name) and thus for stopping: # stop.py import logging logger = logging.getLogger(__name__) def command(options): n = len(options.names) if n == 1: plural = '' services = '\'%s\'' % options.names[0] else: plural = 's' services = ', '.join('\'%s\'' % name for name in options.names) i = services.rfind(', ') services = services[:i] + ' and ' + services[i + 2:] logger.debug('About to stop %s', services) # actually do the command processing here ... logger.info('Stopped the %s service%s.', services, plural) and similarly for restarting: # restart.py import logging logger = logging.getLogger(__name__) def command(options): n = len(options.names) if n == 1: plural = '' services = '\'%s\'' % options.names[0] else: plural = 's' services = ', '.join('\'%s\'' % name for name in options.names) i = services.rfind(', ') services = services[:i] + ' and ' + services[i + 2:] logger.debug('About to restart %s', services) # actually do the command processing here ... logger.info('Restarted the %s service%s.', services, plural) If we run this application with the default log level, we get output like this: $ python app.py start foo INFO start Started the 'foo' service. $ python app.py stop foo bar INFO stop Stopped the 'foo' and 'bar' services. $ python app.py restart foo bar baz INFO restart Restarted the 'foo', 'bar' and 'baz' services. The first word is the logging level, and the second word is the module or package name of the place where the event was logged. If we change the logging level, then we can change the information sent to the log. For example, if we want more information: $ python app.py --log-level DEBUG start foo DEBUG start About to start foo INFO start Started the 'foo' service. $ python app.py --log-level DEBUG stop foo bar DEBUG stop About to stop 'foo' and 'bar' INFO stop Stopped the 'foo' and 'bar' services. $ python app.py --log-level DEBUG restart foo bar baz DEBUG restart About to restart 'foo', 'bar' and 'baz' INFO restart Restarted the 'foo', 'bar' and 'baz' services. And if we want less: $ python app.py --log-level WARNING start foo $ python app.py --log-level WARNING stop foo bar $ python app.py --log-level WARNING restart foo bar baz In this case, the commands don’t print anything to the console, since nothing at "WARNING" level or above is logged by them. A Qt GUI for logging ==================== A question that comes up from time to time is about how to log to a GUI application. The Qt framework is a popular cross-platform UI framework with Python bindings using PySide2 or PyQt5 libraries. The following example shows how to log to a Qt GUI. This introduces a simple "QtHandler" class which takes a callable, which should be a slot in the main thread that does GUI updates. A worker thread is also created to show how you can log to the GUI from both the UI itself (via a button for manual logging) as well as a worker thread doing work in the background (here, just logging messages at random levels with random short delays in between). The worker thread is implemented using Qt’s "QThread" class rather than the "threading" module, as there are circumstances where one has to use "QThread", which offers better integration with other "Qt" components. The code should work with recent releases of any of "PySide6", "PyQt6", "PySide2" or "PyQt5". You should be able to adapt the approach to earlier versions of Qt. Please refer to the comments in the code snippet for more detailed information. import datetime import logging import random import sys import time # Deal with minor differences between different Qt packages try: from PySide6 import QtCore, QtGui, QtWidgets Signal = QtCore.Signal Slot = QtCore.Slot except ImportError: try: from PyQt6 import QtCore, QtGui, QtWidgets Signal = QtCore.pyqtSignal Slot = QtCore.pyqtSlot except ImportError: try: from PySide2 import QtCore, QtGui, QtWidgets Signal = QtCore.Signal Slot = QtCore.Slot except ImportError: from PyQt5 import QtCore, QtGui, QtWidgets Signal = QtCore.pyqtSignal Slot = QtCore.pyqtSlot logger = logging.getLogger(__name__) # # Signals need to be contained in a QObject or subclass in order to be correctly # initialized. # class Signaller(QtCore.QObject): signal = Signal(str, logging.LogRecord) # # Output to a Qt GUI is only supposed to happen on the main thread. So, this # handler is designed to take a slot function which is set up to run in the main # thread. In this example, the function takes a string argument which is a # formatted log message, and the log record which generated it. The formatted # string is just a convenience - you could format a string for output any way # you like in the slot function itself. # # You specify the slot function to do whatever GUI updates you want. The handler # doesn't know or care about specific UI elements. # class QtHandler(logging.Handler): def __init__(self, slotfunc, *args, **kwargs): super().__init__(*args, **kwargs) self.signaller = Signaller() self.signaller.signal.connect(slotfunc) def emit(self, record): s = self.format(record) self.signaller.signal.emit(s, record) # # This example uses QThreads, which means that the threads at the Python level # are named something like "Dummy-1". The function below gets the Qt name of the # current thread. # def ctname(): return QtCore.QThread.currentThread().objectName() # # Used to generate random levels for logging. # LEVELS = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL) # # This worker class represents work that is done in a thread separate to the # main thread. The way the thread is kicked off to do work is via a button press # that connects to a slot in the worker. # # Because the default threadName value in the LogRecord isn't much use, we add # a qThreadName which contains the QThread name as computed above, and pass that # value in an "extra" dictionary which is used to update the LogRecord with the # QThread name. # # This example worker just outputs messages sequentially, interspersed with # random delays of the order of a few seconds. # class Worker(QtCore.QObject): @Slot() def start(self): extra = {'qThreadName': ctname() } logger.debug('Started work', extra=extra) i = 1 # Let the thread run until interrupted. This allows reasonably clean # thread termination. while not QtCore.QThread.currentThread().isInterruptionRequested(): delay = 0.5 + random.random() * 2 time.sleep(delay) try: if random.random() < 0.1: raise ValueError('Exception raised: %d' % i) else: level = random.choice(LEVELS) logger.log(level, 'Message after delay of %3.1f: %d', delay, i, extra=extra) except ValueError as e: logger.exception('Failed: %s', e, extra=extra) i += 1 # # Implement a simple UI for this cookbook example. This contains: # # * A read-only text edit window which holds formatted log messages # * A button to start work and log stuff in a separate thread # * A button to log something from the main thread # * A button to clear the log window # class Window(QtWidgets.QWidget): COLORS = { logging.DEBUG: 'black', logging.INFO: 'blue', logging.WARNING: 'orange', logging.ERROR: 'red', logging.CRITICAL: 'purple', } def __init__(self, app): super().__init__() self.app = app self.textedit = te = QtWidgets.QPlainTextEdit(self) # Set whatever the default monospace font is for the platform f = QtGui.QFont('nosuchfont') if hasattr(f, 'Monospace'): f.setStyleHint(f.Monospace) else: f.setStyleHint(f.StyleHint.Monospace) # for Qt6 te.setFont(f) te.setReadOnly(True) PB = QtWidgets.QPushButton self.work_button = PB('Start background work', self) self.log_button = PB('Log a message at a random level', self) self.clear_button = PB('Clear log window', self) self.handler = h = QtHandler(self.update_status) # Remember to use qThreadName rather than threadName in the format string. fs = '%(asctime)s %(qThreadName)-12s %(levelname)-8s %(message)s' formatter = logging.Formatter(fs) h.setFormatter(formatter) logger.addHandler(h) # Set up to terminate the QThread when we exit app.aboutToQuit.connect(self.force_quit) # Lay out all the widgets layout = QtWidgets.QVBoxLayout(self) layout.addWidget(te) layout.addWidget(self.work_button) layout.addWidget(self.log_button) layout.addWidget(self.clear_button) self.setFixedSize(900, 400) # Connect the non-worker slots and signals self.log_button.clicked.connect(self.manual_update) self.clear_button.clicked.connect(self.clear_display) # Start a new worker thread and connect the slots for the worker self.start_thread() self.work_button.clicked.connect(self.worker.start) # Once started, the button should be disabled self.work_button.clicked.connect(lambda : self.work_button.setEnabled(False)) def start_thread(self): self.worker = Worker() self.worker_thread = QtCore.QThread() self.worker.setObjectName('Worker') self.worker_thread.setObjectName('WorkerThread') # for qThreadName self.worker.moveToThread(self.worker_thread) # This will start an event loop in the worker thread self.worker_thread.start() def kill_thread(self): # Just tell the worker to stop, then tell it to quit and wait for that # to happen self.worker_thread.requestInterruption() if self.worker_thread.isRunning(): self.worker_thread.quit() self.worker_thread.wait() else: print('worker has already exited.') def force_quit(self): # For use when the window is closed if self.worker_thread.isRunning(): self.kill_thread() # The functions below update the UI and run in the main thread because # that's where the slots are set up @Slot(str, logging.LogRecord) def update_status(self, status, record): color = self.COLORS.get(record.levelno, 'black') s = '
%s
' % (color, status) self.textedit.appendHtml(s) @Slot() def manual_update(self): # This function uses the formatted message passed in, but also uses # information from the record to format the message in an appropriate # color according to its severity (level). level = random.choice(LEVELS) extra = {'qThreadName': ctname() } logger.log(level, 'Manually logged!', extra=extra) @Slot() def clear_display(self): self.textedit.clear() def main(): QtCore.QThread.currentThread().setObjectName('MainThread') logging.getLogger().setLevel(logging.DEBUG) app = QtWidgets.QApplication(sys.argv) example = Window(app) example.show() if hasattr(app, 'exec'): rc = app.exec() else: rc = app.exec_() sys.exit(rc) if __name__=='__main__': main() Logging to syslog with RFC5424 support ====================================== Although **RFC 5424** dates from 2009, most syslog servers are configured by default to use the older **RFC 3164**, which hails from 2001. When "logging" was added to Python in 2003, it supported the earlier (and only existing) protocol at the time. Since RFC5424 came out, as there has not been widespread deployment of it in syslog servers, the "SysLogHandler" functionality has not been updated. RFC 5424 contains some useful features such as support for structured data, and if you need to be able to log to a syslog server with support for it, you can do so with a subclassed handler which looks something like this: import datetime import logging.handlers import re import socket import time class SysLogHandler5424(logging.handlers.SysLogHandler): tz_offset = re.compile(r'([+-]\d{2})(\d{2})$') escaped = re.compile(r'([\]"\\])') def __init__(self, *args, **kwargs): self.msgid = kwargs.pop('msgid', None) self.appname = kwargs.pop('appname', None) super().__init__(*args, **kwargs) def format(self, record): version = 1 asctime = datetime.datetime.fromtimestamp(record.created).isoformat() m = self.tz_offset.match(time.strftime('%z')) has_offset = False if m and time.timezone: hrs, mins = m.groups() if int(hrs) or int(mins): has_offset = True if not has_offset: asctime += 'Z' else: asctime += f'{hrs}:{mins}' try: hostname = socket.gethostname() except Exception: hostname = '-' appname = self.appname or '-' procid = record.process msgid = '-' msg = super().format(record) sdata = '-' if hasattr(record, 'structured_data'): sd = record.structured_data # This should be a dict where the keys are SD-ID and the value is a # dict mapping PARAM-NAME to PARAM-VALUE (refer to the RFC for what these # mean) # There's no error checking here - it's purely for illustration, and you # can adapt this code for use in production environments parts = [] def replacer(m): g = m.groups() return '\\' + g[0] for sdid, dv in sd.items(): part = f'[{sdid}' for k, v in dv.items(): s = str(v) s = self.escaped.sub(replacer, s) part += f' {k}="{s}"' part += ']' parts.append(part) sdata = ''.join(parts) return f'{version} {asctime} {hostname} {appname} {procid} {msgid} {sdata} {msg}' You’ll need to be familiar with RFC 5424 to fully understand the above code, and it may be that you have slightly different needs (e.g. for how you pass structural data to the log). Nevertheless, the above should be adaptable to your speciric needs. With the above handler, you’d pass structured data using something like this: sd = { 'foo@12345': {'bar': 'baz', 'baz': 'bozz', 'fizz': r'buzz'}, 'foo@54321': {'rab': 'baz', 'zab': 'bozz', 'zzif': r'buzz'} } extra = {'structured_data': sd} i = 1 logger.debug('Message %d', i, extra=extra) How to treat a logger like an output stream =========================================== Sometimes, you need to interface to a third-party API which expects a file-like object to write to, but you want to direct the API’s output to a logger. You can do this using a class which wraps a logger with a file-like API. Here’s a short script illustrating such a class: import logging class LoggerWriter: def __init__(self, logger, level): self.logger = logger self.level = level def write(self, message): if message != '\n': # avoid printing bare newlines, if you like self.logger.log(self.level, message) def flush(self): # doesn't actually do anything, but might be expected of a file-like # object - so optional depending on your situation pass def close(self): # doesn't actually do anything, but might be expected of a file-like # object - so optional depending on your situation. You might want # to set a flag so that later calls to write raise an exception pass def main(): logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger('demo') info_fp = LoggerWriter(logger, logging.INFO) debug_fp = LoggerWriter(logger, logging.DEBUG) print('An INFO message', file=info_fp) print('A DEBUG message', file=debug_fp) if __name__ == "__main__": main() When this script is run, it prints INFO:demo:An INFO message DEBUG:demo:A DEBUG message You could also use "LoggerWriter" to redirect "sys.stdout" and "sys.stderr" by doing something like this: import sys sys.stdout = LoggerWriter(logger, logging.INFO) sys.stderr = LoggerWriter(logger, logging.WARNING) You should do this *after* configuring logging for your needs. In the above example, the "basicConfig()" call does this (using the "sys.stderr" value *before* it is overwritten by a "LoggerWriter" instance). Then, you’d get this kind of result: >>> print('Foo') INFO:demo:Foo >>> print('Bar', file=sys.stderr) WARNING:demo:Bar >>> Of course, the examples above show output according to the format used by "basicConfig()", but you can use a different formatter when you configure logging. Note that with the above scheme, you are somewhat at the mercy of buffering and the sequence of write calls which you are intercepting. For example, with the definition of "LoggerWriter" above, if you have the snippet sys.stderr = LoggerWriter(logger, logging.WARNING) 1 / 0 then running the script results in WARNING:demo:Traceback (most recent call last): WARNING:demo: File "/home/runner/cookbook-loggerwriter/test.py", line 53, in WARNING:demo: WARNING:demo:main() WARNING:demo: File "/home/runner/cookbook-loggerwriter/test.py", line 49, in main WARNING:demo: WARNING:demo:1 / 0 WARNING:demo:ZeroDivisionError WARNING:demo:: WARNING:demo:division by zero As you can see, this output isn’t ideal. That’s because the underlying code which writes to "sys.stderr" makes multiple writes, each of which results in a separate logged line (for example, the last three lines above). To get around this problem, you need to buffer things and only output log lines when newlines are seen. Let’s use a slightly better implementation of "LoggerWriter": class BufferingLoggerWriter(LoggerWriter): def __init__(self, logger, level): super().__init__(logger, level) self.buffer = '' def write(self, message): if '\n' not in message: self.buffer += message else: parts = message.split('\n') if self.buffer: s = self.buffer + parts.pop(0) self.logger.log(self.level, s) self.buffer = parts.pop() for part in parts: self.logger.log(self.level, part) This just buffers up stuff until a newline is seen, and then logs complete lines. With this approach, you get better output: WARNING:demo:Traceback (most recent call last): WARNING:demo: File "/home/runner/cookbook-loggerwriter/main.py", line 55, in WARNING:demo: main() WARNING:demo: File "/home/runner/cookbook-loggerwriter/main.py", line 52, in main WARNING:demo: 1/0 WARNING:demo:ZeroDivisionError: division by zero Patterns to avoid ================= Although the preceding sections have described ways of doing things you might need to do or deal with, it is worth mentioning some usage patterns which are *unhelpful*, and which should therefore be avoided in most cases. The following sections are in no particular order. Opening the same log file multiple times ---------------------------------------- On Windows, you will generally not be able to open the same file multiple times as this will lead to a “file is in use by another process” error. However, on POSIX platforms you’ll not get any errors if you open the same file multiple times. This could be done accidentally, for example by: * Adding a file handler more than once which references the same file (e.g. by a copy/paste/forget-to-change error). * Opening two files that look different, as they have different names, but are the same because one is a symbolic link to the other. * Forking a process, following which both parent and child have a reference to the same file. This might be through use of the "multiprocessing" module, for example. Opening a file multiple times might *appear* to work most of the time, but can lead to a number of problems in practice: * Logging output can be garbled because multiple threads or processes try to write to the same file. Although logging guards against concurrent use of the same handler instance by multiple threads, there is no such protection if concurrent writes are attempted by two different threads using two different handler instances which happen to point to the same file. * An attempt to delete a file (e.g. during file rotation) silently fails, because there is another reference pointing to it. This can lead to confusion and wasted debugging time - log entries end up in unexpected places, or are lost altogether. Or a file that was supposed to be moved remains in place, and grows in size unexpectedly despite size-based rotation being supposedly in place. Use the techniques outlined in Logging to a single file from multiple processes to circumvent such issues. Using loggers as attributes in a class or passing them as parameters -------------------------------------------------------------------- While there might be unusual cases where you’ll need to do this, in general there is no point because loggers are singletons. Code can always access a given logger instance by name using "logging.getLogger(name)", so passing instances around and holding them as instance attributes is pointless. Note that in other languages such as Java and C#, loggers are often static class attributes. However, this pattern doesn’t make sense in Python, where the module (and not the class) is the unit of software decomposition. Adding handlers other than "NullHandler" to a logger in a library ----------------------------------------------------------------- Configuring logging by adding handlers, formatters and filters is the responsibility of the application developer, not the library developer. If you are maintaining a library, ensure that you don’t add handlers to any of your loggers other than a "NullHandler" instance. Creating a lot of loggers ------------------------- Loggers are singletons that are never freed during a script execution, and so creating lots of loggers will use up memory which can’t then be freed. Rather than create a logger per e.g. file processed or network connection made, use the existing mechanisms for passing contextual information into your logs and restrict the loggers created to those describing areas within your application (generally modules, but occasionally slightly more fine-grained than that). Other resources =============== See also: Module "logging" API reference for the logging module. Module "logging.config" Configuration API for the logging module. Module "logging.handlers" Useful handlers included with the logging module. Basic Tutorial Advanced Tutorial Logging HOWTO ************* Author: Vinay Sajip This page contains tutorial information. For links to reference information and a logging cookbook, please see Other resources. Basic Logging Tutorial ====================== Logging is a means of tracking events that happen when some software runs. The software’s developer adds logging calls to their code to indicate that certain events have occurred. An event is described by a descriptive message which can optionally contain variable data (i.e. data that is potentially different for each occurrence of the event). Events also have an importance which the developer ascribes to the event; the importance can also be called the *level* or *severity*. When to use logging ------------------- You can access logging functionality by creating a logger via "logger = getLogger(__name__)", and then calling the logger’s "debug()", "info()", "warning()", "error()" and "critical()" methods. To determine when to use logging, and to see which logger methods to use when, see the table below. It states, for each of a set of common tasks, the best tool to use for that task. +---------------------------------------+----------------------------------------+ | Task you want to perform | The best tool for the task | |=======================================|========================================| | Display console output for ordinary | "print()" | | usage of a command line script or | | | program | | +---------------------------------------+----------------------------------------+ | Report events that occur during | A logger’s "info()" (or "debug()" | | normal operation of a program (e.g. | method for very detailed output for | | for status monitoring or fault | diagnostic purposes) | | investigation) | | +---------------------------------------+----------------------------------------+ | Issue a warning regarding a | "warnings.warn()" in library code if | | particular runtime event | the issue is avoidable and the client | | | application should be modified to | | | eliminate the warning A logger’s | | | "warning()" method if there is nothing | | | the client application can do about | | | the situation, but the event should | | | still be noted | +---------------------------------------+----------------------------------------+ | Report an error regarding a | Raise an exception | | particular runtime event | | +---------------------------------------+----------------------------------------+ | Report suppression of an error | A logger’s "error()", "exception()" or | | without raising an exception (e.g. | "critical()" method as appropriate for | | error handler in a long-running | the specific error and application | | server process) | domain | +---------------------------------------+----------------------------------------+ The logger methods are named after the level or severity of the events they are used to track. The standard levels and their applicability are described below (in increasing order of severity): +----------------+-----------------------------------------------+ | Level | When it’s used | |================|===============================================| | "DEBUG" | Detailed information, typically of interest | | | only when diagnosing problems. | +----------------+-----------------------------------------------+ | "INFO" | Confirmation that things are working as | | | expected. | +----------------+-----------------------------------------------+ | "WARNING" | An indication that something unexpected | | | happened, or indicative of some problem in | | | the near future (e.g. ‘disk space low’). The | | | software is still working as expected. | +----------------+-----------------------------------------------+ | "ERROR" | Due to a more serious problem, the software | | | has not been able to perform some function. | +----------------+-----------------------------------------------+ | "CRITICAL" | A serious error, indicating that the program | | | itself may be unable to continue running. | +----------------+-----------------------------------------------+ The default level is "WARNING", which means that only events of this severity and higher will be tracked, unless the logging package is configured to do otherwise. Events that are tracked can be handled in different ways. The simplest way of handling tracked events is to print them to the console. Another common way is to write them to a disk file. A simple example ---------------- A very simple example is: import logging logging.warning('Watch out!') # will print a message to the console logging.info('I told you so') # will not print anything If you type these lines into a script and run it, you’ll see: WARNING:root:Watch out! printed out on the console. The "INFO" message doesn’t appear because the default level is "WARNING". The printed message includes the indication of the level and the description of the event provided in the logging call, i.e. ‘Watch out!’. The actual output can be formatted quite flexibly if you need that; formatting options will also be explained later. Notice that in this example, we use functions directly on the "logging" module, like "logging.debug", rather than creating a logger and calling functions on it. These functions operation on the root logger, but can be useful as they will call "basicConfig()" for you if it has not been called yet, like in this example. In larger programs you’ll usually want to control the logging configuration explicitly however - so for that reason as well as others, it’s better to create loggers and call their methods. Logging to a file ----------------- A very common situation is that of recording logging events in a file, so let’s look at that next. Be sure to try the following in a newly started Python interpreter, and don’t just continue from the session described above: import logging logger = logging.getLogger(__name__) logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG) logger.debug('This message should go to the log file') logger.info('So should this') logger.warning('And this, too') logger.error('And non-ASCII stuff, too, like Øresund and Malmö') Changed in version 3.9: The *encoding* argument was added. In earlier Python versions, or if not specified, the encoding used is the default value used by "open()". While not shown in the above example, an *errors* argument can also now be passed, which determines how encoding errors are handled. For available values and the default, see the documentation for "open()". And now if we open the file and look at what we have, we should find the log messages: DEBUG:__main__:This message should go to the log file INFO:__main__:So should this WARNING:__main__:And this, too ERROR:__main__:And non-ASCII stuff, too, like Øresund and Malmö This example also shows how you can set the logging level which acts as the threshold for tracking. In this case, because we set the threshold to "DEBUG", all of the messages were printed. If you want to set the logging level from a command-line option such as: --log=INFO and you have the value of the parameter passed for "--log" in some variable *loglevel*, you can use: getattr(logging, loglevel.upper()) to get the value which you’ll pass to "basicConfig()" via the *level* argument. You may want to error check any user input value, perhaps as in the following example: # assuming loglevel is bound to the string value obtained from the # command line argument. Convert to upper case to allow the user to # specify --log=DEBUG or --log=debug numeric_level = getattr(logging, loglevel.upper(), None) if not isinstance(numeric_level, int): raise ValueError('Invalid log level: %s' % loglevel) logging.basicConfig(level=numeric_level, ...) The call to "basicConfig()" should come *before* any calls to a logger’s methods such as "debug()", "info()", etc. Otherwise, that logging event may not be handled in the desired manner. If you run the above script several times, the messages from successive runs are appended to the file *example.log*. If you want each run to start afresh, not remembering the messages from earlier runs, you can specify the *filemode* argument, by changing the call in the above example to: logging.basicConfig(filename='example.log', filemode='w', level=logging.DEBUG) The output will be the same as before, but the log file is no longer appended to, so the messages from earlier runs are lost. Logging variable data --------------------- To log variable data, use a format string for the event description message and append the variable data as arguments. For example: import logging logging.warning('%s before you %s', 'Look', 'leap!') will display: WARNING:root:Look before you leap! As you can see, merging of variable data into the event description message uses the old, %-style of string formatting. This is for backwards compatibility: the logging package pre-dates newer formatting options such as "str.format()" and "string.Template". These newer formatting options *are* supported, but exploring them is outside the scope of this tutorial: see Using particular formatting styles throughout your application for more information. Changing the format of displayed messages ----------------------------------------- To change the format which is used to display messages, you need to specify the format you want to use: import logging logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG) logging.debug('This message should appear on the console') logging.info('So should this') logging.warning('And this, too') which would print: DEBUG:This message should appear on the console INFO:So should this WARNING:And this, too Notice that the ‘root’ which appeared in earlier examples has disappeared. For a full set of things that can appear in format strings, you can refer to the documentation for LogRecord attributes, but for simple usage, you just need the *levelname* (severity), *message* (event description, including variable data) and perhaps to display when the event occurred. This is described in the next section. Displaying the date/time in messages ------------------------------------ To display the date and time of an event, you would place ‘%(asctime)s’ in your format string: import logging logging.basicConfig(format='%(asctime)s %(message)s') logging.warning('is when this event was logged.') which should print something like this: 2010-12-12 11:41:42,612 is when this event was logged. The default format for date/time display (shown above) is like ISO8601 or **RFC 3339**. If you need more control over the formatting of the date/time, provide a *datefmt* argument to "basicConfig", as in this example: import logging logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p') logging.warning('is when this event was logged.') which would display something like this: 12/12/2010 11:46:36 AM is when this event was logged. The format of the *datefmt* argument is the same as supported by "time.strftime()". Next Steps ---------- That concludes the basic tutorial. It should be enough to get you up and running with logging. There’s a lot more that the logging package offers, but to get the best out of it, you’ll need to invest a little more of your time in reading the following sections. If you’re ready for that, grab some of your favourite beverage and carry on. If your logging needs are simple, then use the above examples to incorporate logging into your own scripts, and if you run into problems or don’t understand something, please post a question on the comp.lang.python Usenet group (available at https://groups.google.com/g/comp.lang.python) and you should receive help before too long. Still here? You can carry on reading the next few sections, which provide a slightly more advanced/in-depth tutorial than the basic one above. After that, you can take a look at the Logging Cookbook. Advanced Logging Tutorial ========================= The logging library takes a modular approach and offers several categories of components: loggers, handlers, filters, and formatters. * Loggers expose the interface that application code directly uses. * Handlers send the log records (created by loggers) to the appropriate destination. * Filters provide a finer grained facility for determining which log records to output. * Formatters specify the layout of log records in the final output. Log event information is passed between loggers, handlers, filters and formatters in a "LogRecord" instance. Logging is performed by calling methods on instances of the "Logger" class (hereafter called *loggers*). Each instance has a name, and they are conceptually arranged in a namespace hierarchy using dots (periods) as separators. For example, a logger named ‘scan’ is the parent of loggers ‘scan.text’, ‘scan.html’ and ‘scan.pdf’. Logger names can be anything you want, and indicate the area of an application in which a logged message originates. A good convention to use when naming loggers is to use a module-level logger, in each module which uses logging, named as follows: logger = logging.getLogger(__name__) This means that logger names track the package/module hierarchy, and it’s intuitively obvious where events are logged just from the logger name. The root of the hierarchy of loggers is called the root logger. That’s the logger used by the functions "debug()", "info()", "warning()", "error()" and "critical()", which just call the same-named method of the root logger. The functions and the methods have the same signatures. The root logger’s name is printed as ‘root’ in the logged output. It is, of course, possible to log messages to different destinations. Support is included in the package for writing log messages to files, HTTP GET/POST locations, email via SMTP, generic sockets, queues, or OS-specific logging mechanisms such as syslog or the Windows NT event log. Destinations are served by *handler* classes. You can create your own log destination class if you have special requirements not met by any of the built-in handler classes. By default, no destination is set for any logging messages. You can specify a destination (such as console or file) by using "basicConfig()" as in the tutorial examples. If you call the functions "debug()", "info()", "warning()", "error()" and "critical()", they will check to see if no destination is set; and if one is not set, they will set a destination of the console ("sys.stderr") and a default format for the displayed message before delegating to the root logger to do the actual message output. The default format set by "basicConfig()" for messages is: severity:logger name:message You can change this by passing a format string to "basicConfig()" with the *format* keyword argument. For all options regarding how a format string is constructed, see Formatter Objects. Logging Flow ------------ The flow of log event information in loggers and handlers is illustrated in the following diagram. [image] Loggers ------- "Logger" objects have a threefold job. First, they expose several methods to application code so that applications can log messages at runtime. Second, logger objects determine which log messages to act upon based upon severity (the default filtering facility) or filter objects. Third, logger objects pass along relevant log messages to all interested log handlers. The most widely used methods on logger objects fall into two categories: configuration and message sending. These are the most common configuration methods: * "Logger.setLevel()" specifies the lowest-severity log message a logger will handle, where debug is the lowest built-in severity level and critical is the highest built-in severity. For example, if the severity level is INFO, the logger will handle only INFO, WARNING, ERROR, and CRITICAL messages and will ignore DEBUG messages. * "Logger.addHandler()" and "Logger.removeHandler()" add and remove handler objects from the logger object. Handlers are covered in more detail in Handlers. * "Logger.addFilter()" and "Logger.removeFilter()" add and remove filter objects from the logger object. Filters are covered in more detail in Filter Objects. You don’t need to always call these methods on every logger you create. See the last two paragraphs in this section. With the logger object configured, the following methods create log messages: * "Logger.debug()", "Logger.info()", "Logger.warning()", "Logger.error()", and "Logger.critical()" all create log records with a message and a level that corresponds to their respective method names. The message is actually a format string, which may contain the standard string substitution syntax of "%s", "%d", "%f", and so on. The rest of their arguments is a list of objects that correspond with the substitution fields in the message. With regard to "**kwargs", the logging methods care only about a keyword of "exc_info" and use it to determine whether to log exception information. * "Logger.exception()" creates a log message similar to "Logger.error()". The difference is that "Logger.exception()" dumps a stack trace along with it. Call this method only from an exception handler. * "Logger.log()" takes a log level as an explicit argument. This is a little more verbose for logging messages than using the log level convenience methods listed above, but this is how to log at custom log levels. "getLogger()" returns a reference to a logger instance with the specified name if it is provided, or "root" if not. The names are period-separated hierarchical structures. Multiple calls to "getLogger()" with the same name will return a reference to the same logger object. Loggers that are further down in the hierarchical list are children of loggers higher up in the list. For example, given a logger with a name of "foo", loggers with names of "foo.bar", "foo.bar.baz", and "foo.bam" are all descendants of "foo". Loggers have a concept of *effective level*. If a level is not explicitly set on a logger, the level of its parent is used instead as its effective level. If the parent has no explicit level set, *its* parent is examined, and so on - all ancestors are searched until an explicitly set level is found. The root logger always has an explicit level set ("WARNING" by default). When deciding whether to process an event, the effective level of the logger is used to determine whether the event is passed to the logger’s handlers. Child loggers propagate messages up to the handlers associated with their ancestor loggers. Because of this, it is unnecessary to define and configure handlers for all the loggers an application uses. It is sufficient to configure handlers for a top-level logger and create child loggers as needed. (You can, however, turn off propagation by setting the *propagate* attribute of a logger to "False".) Handlers -------- "Handler" objects are responsible for dispatching the appropriate log messages (based on the log messages’ severity) to the handler’s specified destination. "Logger" objects can add zero or more handler objects to themselves with an "addHandler()" method. As an example scenario, an application may want to send all log messages to a log file, all log messages of error or higher to stdout, and all messages of critical to an email address. This scenario requires three individual handlers where each handler is responsible for sending messages of a specific severity to a specific location. The standard library includes quite a few handler types (see Useful Handlers); the tutorials use mainly "StreamHandler" and "FileHandler" in its examples. There are very few methods in a handler for application developers to concern themselves with. The only handler methods that seem relevant for application developers who are using the built-in handler objects (that is, not creating custom handlers) are the following configuration methods: * The "setLevel()" method, just as in logger objects, specifies the lowest severity that will be dispatched to the appropriate destination. Why are there two "setLevel()" methods? The level set in the logger determines which severity of messages it will pass to its handlers. The level set in each handler determines which messages that handler will send on. * "setFormatter()" selects a Formatter object for this handler to use. * "addFilter()" and "removeFilter()" respectively configure and deconfigure filter objects on handlers. Application code should not directly instantiate and use instances of "Handler". Instead, the "Handler" class is a base class that defines the interface that all handlers should have and establishes some default behavior that child classes can use (or override). Formatters ---------- Formatter objects configure the final order, structure, and contents of the log message. Unlike the base "logging.Handler" class, application code may instantiate formatter classes, although you could likely subclass the formatter if your application needs special behavior. The constructor takes three optional arguments – a message format string, a date format string and a style indicator. logging.Formatter.__init__(fmt=None, datefmt=None, style='%') If there is no message format string, the default is to use the raw message. If there is no date format string, the default date format is: %Y-%m-%d %H:%M:%S with the milliseconds tacked on at the end. The "style" is one of "'%'", "'{'", or "'$'". If one of these is not specified, then "'%'" will be used. If the "style" is "'%'", the message format string uses "%()s" styled string substitution; the possible keys are documented in LogRecord attributes. If the style is "'{'", the message format string is assumed to be compatible with "str.format()" (using keyword arguments), while if the style is "'$'" then the message format string should conform to what is expected by "string.Template.substitute()". Changed in version 3.2: Added the "style" parameter. The following message format string will log the time in a human- readable format, the severity of the message, and the contents of the message, in that order: '%(asctime)s - %(levelname)s - %(message)s' Formatters use a user-configurable function to convert the creation time of a record to a tuple. By default, "time.localtime()" is used; to change this for a particular formatter instance, set the "converter" attribute of the instance to a function with the same signature as "time.localtime()" or "time.gmtime()". To change it for all formatters, for example if you want all logging times to be shown in GMT, set the "converter" attribute in the Formatter class (to "time.gmtime" for GMT display). Configuring Logging ------------------- Programmers can configure logging in three ways: 1. Creating loggers, handlers, and formatters explicitly using Python code that calls the configuration methods listed above. 2. Creating a logging config file and reading it using the "fileConfig()" function. 3. Creating a dictionary of configuration information and passing it to the "dictConfig()" function. For the reference documentation on the last two options, see Configuration functions. The following example configures a very simple logger, a console handler, and a simple formatter using Python code: import logging # create logger logger = logging.getLogger('simple_example') logger.setLevel(logging.DEBUG) # create console handler and set level to debug ch = logging.StreamHandler() ch.setLevel(logging.DEBUG) # create formatter formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') # add formatter to ch ch.setFormatter(formatter) # add ch to logger logger.addHandler(ch) # 'application' code logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') Running this module from the command line produces the following output: $ python simple_logging_module.py 2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message 2005-03-19 15:10:26,620 - simple_example - INFO - info message 2005-03-19 15:10:26,695 - simple_example - WARNING - warn message 2005-03-19 15:10:26,697 - simple_example - ERROR - error message 2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message The following Python module creates a logger, handler, and formatter nearly identical to those in the example listed above, with the only difference being the names of the objects: import logging import logging.config logging.config.fileConfig('logging.conf') # create logger logger = logging.getLogger('simpleExample') # 'application' code logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') Here is the logging.conf file: [loggers] keys=root,simpleExample [handlers] keys=consoleHandler [formatters] keys=simpleFormatter [logger_root] level=DEBUG handlers=consoleHandler [logger_simpleExample] level=DEBUG handlers=consoleHandler qualname=simpleExample propagate=0 [handler_consoleHandler] class=StreamHandler level=DEBUG formatter=simpleFormatter args=(sys.stdout,) [formatter_simpleFormatter] format=%(asctime)s - %(name)s - %(levelname)s - %(message)s The output is nearly identical to that of the non-config-file-based example: $ python simple_logging_config.py 2005-03-19 15:38:55,977 - simpleExample - DEBUG - debug message 2005-03-19 15:38:55,979 - simpleExample - INFO - info message 2005-03-19 15:38:56,054 - simpleExample - WARNING - warn message 2005-03-19 15:38:56,055 - simpleExample - ERROR - error message 2005-03-19 15:38:56,130 - simpleExample - CRITICAL - critical message You can see that the config file approach has a few advantages over the Python code approach, mainly separation of configuration and code and the ability of noncoders to easily modify the logging properties. Warning: The "fileConfig()" function takes a default parameter, "disable_existing_loggers", which defaults to "True" for reasons of backward compatibility. This may or may not be what you want, since it will cause any non-root loggers existing before the "fileConfig()" call to be disabled unless they (or an ancestor) are explicitly named in the configuration. Please refer to the reference documentation for more information, and specify "False" for this parameter if you wish.The dictionary passed to "dictConfig()" can also specify a Boolean value with key "disable_existing_loggers", which if not specified explicitly in the dictionary also defaults to being interpreted as "True". This leads to the logger-disabling behaviour described above, which may not be what you want - in which case, provide the key explicitly with a value of "False". Note that the class names referenced in config files need to be either relative to the logging module, or absolute values which can be resolved using normal import mechanisms. Thus, you could use either "WatchedFileHandler" (relative to the logging module) or "mypackage.mymodule.MyHandler" (for a class defined in package "mypackage" and module "mymodule", where "mypackage" is available on the Python import path). In Python 3.2, a new means of configuring logging has been introduced, using dictionaries to hold configuration information. This provides a superset of the functionality of the config-file-based approach outlined above, and is the recommended configuration method for new applications and deployments. Because a Python dictionary is used to hold configuration information, and since you can populate that dictionary using different means, you have more options for configuration. For example, you can use a configuration file in JSON format, or, if you have access to YAML processing functionality, a file in YAML format, to populate the configuration dictionary. Or, of course, you can construct the dictionary in Python code, receive it in pickled form over a socket, or use whatever approach makes sense for your application. Here’s an example of the same configuration as above, in YAML format for the new dictionary-based approach: version: 1 formatters: simple: format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s' handlers: console: class: logging.StreamHandler level: DEBUG formatter: simple stream: ext://sys.stdout loggers: simpleExample: level: DEBUG handlers: [console] propagate: no root: level: DEBUG handlers: [console] For more information about logging using a dictionary, see Configuration functions. What happens if no configuration is provided -------------------------------------------- If no logging configuration is provided, it is possible to have a situation where a logging event needs to be output, but no handlers can be found to output the event. The event is output using a ‘handler of last resort’, stored in "lastResort". This internal handler is not associated with any logger, and acts like a "StreamHandler" which writes the event description message to the current value of "sys.stderr" (therefore respecting any redirections which may be in effect). No formatting is done on the message - just the bare event description message is printed. The handler’s level is set to "WARNING", so all events at this and greater severities will be output. Changed in version 3.2: For versions of Python prior to 3.2, the behaviour is as follows: * If "raiseExceptions" is "False" (production mode), the event is silently dropped. * If "raiseExceptions" is "True" (development mode), a message ‘No handlers could be found for logger X.Y.Z’ is printed once. To obtain the pre-3.2 behaviour, "lastResort" can be set to "None". Configuring Logging for a Library --------------------------------- When developing a library which uses logging, you should take care to document how the library uses logging - for example, the names of loggers used. Some consideration also needs to be given to its logging configuration. If the using application does not use logging, and library code makes logging calls, then (as described in the previous section) events of severity "WARNING" and greater will be printed to "sys.stderr". This is regarded as the best default behaviour. If for some reason you *don’t* want these messages printed in the absence of any logging configuration, you can attach a do-nothing handler to the top-level logger for your library. This avoids the message being printed, since a handler will always be found for the library’s events: it just doesn’t produce any output. If the library user configures logging for application use, presumably that configuration will add some handlers, and if levels are suitably configured then logging calls made in library code will send output to those handlers, as normal. A do-nothing handler is included in the logging package: "NullHandler" (since Python 3.1). An instance of this handler could be added to the top-level logger of the logging namespace used by the library (*if* you want to prevent your library’s logged events being output to "sys.stderr" in the absence of logging configuration). If all logging by a library *foo* is done using loggers with names matching ‘foo.x’, ‘foo.x.y’, etc. then the code: import logging logging.getLogger('foo').addHandler(logging.NullHandler()) should have the desired effect. If an organisation produces a number of libraries, then the logger name specified can be ‘orgname.foo’ rather than just ‘foo’. Note: It is strongly advised that you *do not log to the root logger* in your library. Instead, use a logger with a unique and easily identifiable name, such as the "__name__" for your library’s top- level package or module. Logging to the root logger will make it difficult or impossible for the application developer to configure the logging verbosity or handlers of your library as they wish. Note: It is strongly advised that you *do not add any handlers other than* "NullHandler" *to your library’s loggers*. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements. Logging Levels ============== The numeric values of logging levels are given in the following table. These are primarily of interest if you want to define your own levels, and need them to have specific values relative to the predefined levels. If you define a level with the same numeric value, it overwrites the predefined value; the predefined name is lost. +----------------+-----------------+ | Level | Numeric value | |================|=================| | "CRITICAL" | 50 | +----------------+-----------------+ | "ERROR" | 40 | +----------------+-----------------+ | "WARNING" | 30 | +----------------+-----------------+ | "INFO" | 20 | +----------------+-----------------+ | "DEBUG" | 10 | +----------------+-----------------+ | "NOTSET" | 0 | +----------------+-----------------+ Levels can also be associated with loggers, being set either by the developer or through loading a saved logging configuration. When a logging method is called on a logger, the logger compares its own level with the level associated with the method call. If the logger’s level is higher than the method call’s, no logging message is actually generated. This is the basic mechanism controlling the verbosity of logging output. Logging messages are encoded as instances of the "LogRecord" class. When a logger decides to actually log an event, a "LogRecord" instance is created from the logging message. Logging messages are subjected to a dispatch mechanism through the use of *handlers*, which are instances of subclasses of the "Handler" class. Handlers are responsible for ensuring that a logged message (in the form of a "LogRecord") ends up in a particular location (or set of locations) which is useful for the target audience for that message (such as end users, support desk staff, system administrators, developers). Handlers are passed "LogRecord" instances intended for particular destinations. Each logger can have zero, one or more handlers associated with it (via the "addHandler()" method of "Logger"). In addition to any handlers directly associated with a logger, *all handlers associated with all ancestors of the logger* are called to dispatch the message (unless the *propagate* flag for a logger is set to a false value, at which point the passing to ancestor handlers stops). Just as for loggers, handlers can have levels associated with them. A handler’s level acts as a filter in the same way as a logger’s level does. If a handler decides to actually dispatch an event, the "emit()" method is used to send the message to its destination. Most user- defined subclasses of "Handler" will need to override this "emit()". Custom Levels ------------- Defining your own levels is possible, but should not be necessary, as the existing levels have been chosen on the basis of practical experience. However, if you are convinced that you need custom levels, great care should be exercised when doing this, and it is possibly *a very bad idea to define custom levels if you are developing a library*. That’s because if multiple library authors all define their own custom levels, there is a chance that the logging output from such multiple libraries used together will be difficult for the using developer to control and/or interpret, because a given numeric value might mean different things for different libraries. Useful Handlers =============== In addition to the base "Handler" class, many useful subclasses are provided: 1. "StreamHandler" instances send messages to streams (file-like objects). 2. "FileHandler" instances send messages to disk files. 3. "BaseRotatingHandler" is the base class for handlers that rotate log files at a certain point. It is not meant to be instantiated directly. Instead, use "RotatingFileHandler" or "TimedRotatingFileHandler". 4. "RotatingFileHandler" instances send messages to disk files, with support for maximum log file sizes and log file rotation. 5. "TimedRotatingFileHandler" instances send messages to disk files, rotating the log file at certain timed intervals. 6. "SocketHandler" instances send messages to TCP/IP sockets. Since 3.4, Unix domain sockets are also supported. 7. "DatagramHandler" instances send messages to UDP sockets. Since 3.4, Unix domain sockets are also supported. 8. "SMTPHandler" instances send messages to a designated email address. 9. "SysLogHandler" instances send messages to a Unix syslog daemon, possibly on a remote machine. 10. "NTEventLogHandler" instances send messages to a Windows NT/2000/XP event log. 11. "MemoryHandler" instances send messages to a buffer in memory, which is flushed whenever specific criteria are met. 12. "HTTPHandler" instances send messages to an HTTP server using either "GET" or "POST" semantics. 13. "WatchedFileHandler" instances watch the file they are logging to. If the file changes, it is closed and reopened using the file name. This handler is only useful on Unix-like systems; Windows does not support the underlying mechanism used. 14. "QueueHandler" instances send messages to a queue, such as those implemented in the "queue" or "multiprocessing" modules. 15. "NullHandler" instances do nothing with error messages. They are used by library developers who want to use logging, but want to avoid the ‘No handlers could be found for logger *XXX*’ message which can be displayed if the library user has not configured logging. See Configuring Logging for a Library for more information. Added in version 3.1: The "NullHandler" class. Added in version 3.2: The "QueueHandler" class. The "NullHandler", "StreamHandler" and "FileHandler" classes are defined in the core logging package. The other handlers are defined in a sub-module, "logging.handlers". (There is also another sub-module, "logging.config", for configuration functionality.) Logged messages are formatted for presentation through instances of the "Formatter" class. They are initialized with a format string suitable for use with the % operator and a dictionary. For formatting multiple messages in a batch, instances of "BufferingFormatter" can be used. In addition to the format string (which is applied to each message in the batch), there is provision for header and trailer format strings. When filtering based on logger level and/or handler level is not enough, instances of "Filter" can be added to both "Logger" and "Handler" instances (through their "addFilter()" method). Before deciding to process a message further, both loggers and handlers consult all their filters for permission. If any filter returns a false value, the message is not processed further. The basic "Filter" functionality allows filtering by specific logger name. If this feature is used, messages sent to the named logger and its children are allowed through the filter, and all others dropped. Exceptions raised during logging ================================ The logging package is designed to swallow exceptions which occur while logging in production. This is so that errors which occur while handling logging events - such as logging misconfiguration, network or other similar errors - do not cause the application using logging to terminate prematurely. "SystemExit" and "KeyboardInterrupt" exceptions are never swallowed. Other exceptions which occur during the "emit()" method of a "Handler" subclass are passed to its "handleError()" method. The default implementation of "handleError()" in "Handler" checks to see if a module-level variable, "raiseExceptions", is set. If set, a traceback is printed to "sys.stderr". If not set, the exception is swallowed. Note: The default value of "raiseExceptions" is "True". This is because during development, you typically want to be notified of any exceptions that occur. It’s advised that you set "raiseExceptions" to "False" for production usage. Using arbitrary objects as messages =================================== In the preceding sections and examples, it has been assumed that the message passed when logging the event is a string. However, this is not the only possibility. You can pass an arbitrary object as a message, and its "__str__()" method will be called when the logging system needs to convert it to a string representation. In fact, if you want to, you can avoid computing a string representation altogether - for example, the "SocketHandler" emits an event by pickling it and sending it over the wire. Optimization ============ Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event. To decide what to do, you can call the "isEnabledFor()" method which takes a level argument and returns true if the event would be created by the Logger for that level of call. You can write code like this: if logger.isEnabledFor(logging.DEBUG): logger.debug('Message with %s, %s', expensive_func1(), expensive_func2()) so that if the logger’s threshold is set above "DEBUG", the calls to "expensive_func1" and "expensive_func2" are never made. Note: In some cases, "isEnabledFor()" can itself be more expensive than you’d like (e.g. for deeply nested loggers where an explicit level is only set high up in the logger hierarchy). In such cases (or if you want to avoid calling a method in tight loops), you can cache the result of a call to "isEnabledFor()" in a local or instance variable, and use that instead of calling the method each time. Such a cached value would only need to be recomputed when the logging configuration changes dynamically while the application is running (which is not all that common). There are other optimizations which can be made for specific applications which need more precise control over what logging information is collected. Here’s a list of things you can do to avoid processing during logging which you don’t need: +-------------------------------------------------------+-----------------------------------------------------+ | What you don’t want to collect | How to avoid collecting it | |=======================================================|=====================================================| | Information about where calls were made from. | Set "logging._srcfile" to "None". This avoids | | | calling "sys._getframe()", which may help to speed | | | up your code in environments like PyPy (which can’t | | | speed up code that uses "sys._getframe()"). | +-------------------------------------------------------+-----------------------------------------------------+ | Threading information. | Set "logging.logThreads" to "False". | +-------------------------------------------------------+-----------------------------------------------------+ | Current process ID ("os.getpid()") | Set "logging.logProcesses" to "False". | +-------------------------------------------------------+-----------------------------------------------------+ | Current process name when using "multiprocessing" to | Set "logging.logMultiprocessing" to "False". | | manage multiple processes. | | +-------------------------------------------------------+-----------------------------------------------------+ | Current "asyncio.Task" name when using "asyncio". | Set "logging.logAsyncioTasks" to "False". | +-------------------------------------------------------+-----------------------------------------------------+ Also note that the core logging module only includes the basic handlers. If you don’t import "logging.handlers" and "logging.config", they won’t take up any memory. Other resources =============== See also: Module "logging" API reference for the logging module. Module "logging.config" Configuration API for the logging module. Module "logging.handlers" Useful handlers included with the logging module. A logging cookbook The Python 2.3 Method Resolution Order ************************************** Note: This is a historical document, provided as an appendix to the official documentation. The Method Resolution Order discussed here was *introduced* in Python 2.3, but it is still used in later versions – including Python 3. By Michele Simionato. Abstract: *This document is intended for Python programmers who want to understand the C3 Method Resolution Order used in Python 2.3. Although it is not intended for newbies, it is quite pedagogical with many worked out examples. I am not aware of other publicly available documents with the same scope, therefore it should be useful.* Disclaimer: *I donate this document to the Python Software Foundation, under the Python 2.3 license. As usual in these circumstances, I warn the reader that what follows* should *be correct, but I don’t give any warranty. Use it at your own risk and peril!* Acknowledgments: *All the people of the Python mailing list who sent me their support. Paul Foley who pointed out various imprecisions and made me to add the part on local precedence ordering. David Goodger for help with the formatting in reStructuredText. David Mertz for help with the editing. Finally, Guido van Rossum who enthusiastically added this document to the official Python 2.3 home-page.* The beginning ============= *Felix qui potuit rerum cognoscere causas* – Virgilius Everything started with a post by Samuele Pedroni to the Python development mailing list [1]. In his post, Samuele showed that the Python 2.2 method resolution order is not monotonic and he proposed to replace it with the C3 method resolution order. Guido agreed with his arguments and therefore now Python 2.3 uses C3. The C3 method itself has nothing to do with Python, since it was invented by people working on Dylan and it is described in a paper intended for lispers [2]. The present paper gives a (hopefully) readable discussion of the C3 algorithm for Pythonistas who want to understand the reasons for the change. First of all, let me point out that what I am going to say only applies to the *new style classes* introduced in Python 2.2: *classic classes* maintain their old method resolution order, depth first and then left to right. Therefore, there is no breaking of old code for classic classes; and even if in principle there could be breaking of code for Python 2.2 new style classes, in practice the cases in which the C3 resolution order differs from the Python 2.2 method resolution order are so rare that no real breaking of code is expected. Therefore: *Don’t be scared!* Moreover, unless you make strong use of multiple inheritance and you have non-trivial hierarchies, you don’t need to understand the C3 algorithm, and you can easily skip this paper. On the other hand, if you really want to know how multiple inheritance works, then this paper is for you. The good news is that things are not as complicated as you might expect. Let me begin with some basic definitions. 1. Given a class C in a complicated multiple inheritance hierarchy, it is a non-trivial task to specify the order in which methods are overridden, i.e. to specify the order of the ancestors of C. 2. The list of the ancestors of a class C, including the class itself, ordered from the nearest ancestor to the furthest, is called the class precedence list or the *linearization* of C. 3. The *Method Resolution Order* (MRO) is the set of rules that construct the linearization. In the Python literature, the idiom “the MRO of C” is also used as a synonymous for the linearization of the class C. 4. For instance, in the case of single inheritance hierarchy, if C is a subclass of C1, and C1 is a subclass of C2, then the linearization of C is simply the list [C, C1 , C2]. However, with multiple inheritance hierarchies, the construction of the linearization is more cumbersome, since it is more difficult to construct a linearization that respects *local precedence ordering* and *monotonicity*. 5. I will discuss the local precedence ordering later, but I can give the definition of monotonicity here. A MRO is monotonic when the following is true: *if C1 precedes C2 in the linearization of C, then C1 precedes C2 in the linearization of any subclass of C*. Otherwise, the innocuous operation of deriving a new class could change the resolution order of methods, potentially introducing very subtle bugs. Examples where this happens will be shown later. 6. Not all classes admit a linearization. There are cases, in complicated hierarchies, where it is not possible to derive a class such that its linearization respects all the desired properties. Here I give an example of this situation. Consider the hierarchy >>> O = object >>> class X(O): pass >>> class Y(O): pass >>> class A(X,Y): pass >>> class B(Y,X): pass which can be represented with the following inheritance graph, where I have denoted with O the "object" class, which is the beginning of any hierarchy for new style classes: ----------- | | | O | | / \ | - X Y / | / | / | / |/ A B \ / ? In this case, it is not possible to derive a new class C from A and B, since X precedes Y in A, but Y precedes X in B, therefore the method resolution order would be ambiguous in C. Python 2.3 raises an exception in this situation (TypeError: MRO conflict among bases Y, X) forbidding the naive programmer from creating ambiguous hierarchies. Python 2.2 instead does not raise an exception, but chooses an *ad hoc* ordering (CABXYO in this case). The C3 Method Resolution Order ============================== Let me introduce a few simple notations which will be useful for the following discussion. I will use the shortcut notation: C1 C2 ... CN to indicate the list of classes [C1, C2, … , CN]. The *head* of the list is its first element: head = C1 whereas the *tail* is the rest of the list: tail = C2 ... CN. I shall also use the notation: C + (C1 C2 ... CN) = C C1 C2 ... CN to denote the sum of the lists [C] + [C1, C2, … ,CN]. Now I can explain how the MRO works in Python 2.3. Consider a class C in a multiple inheritance hierarchy, with C inheriting from the base classes B1, B2, … , BN. We want to compute the linearization L[C] of the class C. The rule is the following: *the linearization of C is the sum of C plus the merge of the linearizations of the parents and the list of the parents.* In symbolic notation: L[C(B1 ... BN)] = C + merge(L[B1] ... L[BN], B1 ... BN) In particular, if C is the "object" class, which has no parents, the linearization is trivial: L[object] = object. However, in general one has to compute the merge according to the following prescription: *take the head of the first list, i.e L[B1][0]; if this head is not in the tail of any of the other lists, then add it to the linearization of C and remove it from the lists in the merge, otherwise look at the head of the next list and take it, if it is a good head. Then repeat the operation until all the class are removed or it is impossible to find good heads. In this case, it is impossible to construct the merge, Python 2.3 will refuse to create the class C and will raise an exception.* This prescription ensures that the merge operation *preserves* the ordering, if the ordering can be preserved. On the other hand, if the order cannot be preserved (as in the example of serious order disagreement discussed above) then the merge cannot be computed. The computation of the merge is trivial if C has only one parent (single inheritance); in this case: L[C(B)] = C + merge(L[B],B) = C + L[B] However, in the case of multiple inheritance things are more cumbersome and I don’t expect you can understand the rule without a couple of examples ;-) Examples ======== First example. Consider the following hierarchy: >>> O = object >>> class F(O): pass >>> class E(O): pass >>> class D(O): pass >>> class C(D,F): pass >>> class B(D,E): pass >>> class A(B,C): pass In this case the inheritance graph can be drawn as: 6 --- Level 3 | O | (more general) / --- \ / | \ | / | \ | / | \ | --- --- --- | Level 2 3 | D | 4| E | | F | 5 | --- --- --- | \ \ _ / | | \ / \ _ | | \ / \ | | --- --- | Level 1 1 | B | | C | 2 | --- --- | \ / | \ / \ / --- Level 0 0 | A | (more specialized) --- The linearizations of O,D,E and F are trivial: L[O] = O L[D] = D O L[E] = E O L[F] = F O The linearization of B can be computed as: L[B] = B + merge(DO, EO, DE) We see that D is a good head, therefore we take it and we are reduced to compute "merge(O,EO,E)". Now O is not a good head, since it is in the tail of the sequence EO. In this case the rule says that we have to skip to the next sequence. Then we see that E is a good head; we take it and we are reduced to compute "merge(O,O)" which gives O. Therefore: L[B] = B D E O Using the same procedure one finds: L[C] = C + merge(DO,FO,DF) = C + D + merge(O,FO,F) = C + D + F + merge(O,O) = C D F O Now we can compute: L[A] = A + merge(BDEO,CDFO,BC) = A + B + merge(DEO,CDFO,C) = A + B + C + merge(DEO,DFO) = A + B + C + D + merge(EO,FO) = A + B + C + D + E + merge(O,FO) = A + B + C + D + E + F + merge(O,O) = A B C D E F O In this example, the linearization is ordered in a pretty nice way according to the inheritance level, in the sense that lower levels (i.e. more specialized classes) have higher precedence (see the inheritance graph). However, this is not the general case. I leave as an exercise for the reader to compute the linearization for my second example: >>> O = object >>> class F(O): pass >>> class E(O): pass >>> class D(O): pass >>> class C(D,F): pass >>> class B(E,D): pass >>> class A(B,C): pass The only difference with the previous example is the change B(D,E) –> B(E,D); however even such a little modification completely changes the ordering of the hierarchy: 6 --- Level 3 | O | / --- \ / | \ / | \ / | \ --- --- --- Level 2 2 | E | 4 | D | | F | 5 --- --- --- \ / \ / \ / \ / \ / \ / --- --- Level 1 1 | B | | C | 3 --- --- \ / \ / --- Level 0 0 | A | --- Notice that the class E, which is in the second level of the hierarchy, precedes the class C, which is in the first level of the hierarchy, i.e. E is more specialized than C, even if it is in a higher level. A lazy programmer can obtain the MRO directly from Python 2.2, since in this case it coincides with the Python 2.3 linearization. It is enough to invoke the "mro()" method of class A: >>> A.mro() [, , , , , , ] Finally, let me consider the example discussed in the first section, involving a serious order disagreement. In this case, it is straightforward to compute the linearizations of O, X, Y, A and B: L[O] = 0 L[X] = X O L[Y] = Y O L[A] = A X Y O L[B] = B Y X O However, it is impossible to compute the linearization for a class C that inherits from A and B: L[C] = C + merge(AXYO, BYXO, AB) = C + A + merge(XYO, BYXO, B) = C + A + B + merge(XYO, YXO) At this point we cannot merge the lists XYO and YXO, since X is in the tail of YXO whereas Y is in the tail of XYO: therefore there are no good heads and the C3 algorithm stops. Python 2.3 raises an error and refuses to create the class C. Bad Method Resolution Orders ============================ A MRO is *bad* when it breaks such fundamental properties as local precedence ordering and monotonicity. In this section, I will show that both the MRO for classic classes and the MRO for new style classes in Python 2.2 are bad. It is easier to start with the local precedence ordering. Consider the following example: >>> F=type('Food',(),{'remember2buy':'spam'}) >>> E=type('Eggs',(F,),{'remember2buy':'eggs'}) >>> G=type('GoodFood',(F,E),{}) # under Python 2.3 this is an error! with inheritance diagram O | (buy spam) F | \ | E (buy eggs) | / G (buy eggs or spam ?) We see that class G inherits from F and E, with F *before* E: therefore we would expect the attribute *G.remember2buy* to be inherited by *F.remember2buy* and not by *E.remember2buy*: nevertheless Python 2.2 gives >>> G.remember2buy 'eggs' This is a breaking of local precedence ordering since the order in the local precedence list, i.e. the list of the parents of G, is not preserved in the Python 2.2 linearization of G: L[G,P22]= G E F object # F *follows* E One could argue that the reason why F follows E in the Python 2.2 linearization is that F is less specialized than E, since F is the superclass of E; nevertheless the breaking of local precedence ordering is quite non-intuitive and error prone. This is particularly true since it is a different from old style classes: >>> class F: remember2buy='spam' >>> class E(F): remember2buy='eggs' >>> class G(F,E): pass >>> G.remember2buy 'spam' In this case the MRO is GFEF and the local precedence ordering is preserved. As a general rule, hierarchies such as the previous one should be avoided, since it is unclear if F should override E or vice-versa. Python 2.3 solves the ambiguity by raising an exception in the creation of class G, effectively stopping the programmer from generating ambiguous hierarchies. The reason for that is that the C3 algorithm fails when the merge: merge(FO,EFO,FE) cannot be computed, because F is in the tail of EFO and E is in the tail of FE. The real solution is to design a non-ambiguous hierarchy, i.e. to derive G from E and F (the more specific first) and not from F and E; in this case the MRO is GEF without any doubt. O | F (spam) / | (eggs) E | \ | G (eggs, no doubt) Python 2.3 forces the programmer to write good hierarchies (or, at least, less error-prone ones). On a related note, let me point out that the Python 2.3 algorithm is smart enough to recognize obvious mistakes, as the duplication of classes in the list of parents: >>> class A(object): pass >>> class C(A,A): pass # error Traceback (most recent call last): File "", line 1, in ? TypeError: duplicate base class A Python 2.2 (both for classic classes and new style classes) in this situation, would not raise any exception. Finally, I would like to point out two lessons we have learned from this example: 1. despite the name, the MRO determines the resolution order of attributes, not only of methods; 2. the default food for Pythonistas is spam ! (but you already knew that ;-) Having discussed the issue of local precedence ordering, let me now consider the issue of monotonicity. My goal is to show that neither the MRO for classic classes nor that for Python 2.2 new style classes is monotonic. To prove that the MRO for classic classes is non-monotonic is rather trivial, it is enough to look at the diamond diagram: C / \ / \ A B \ / \ / D One easily discerns the inconsistency: L[B,P21] = B C # B precedes C : B's methods win L[D,P21] = D A C B C # B follows C : C's methods win! On the other hand, there are no problems with the Python 2.2 and 2.3 MROs, they give both: L[D] = D A B C Guido points out in his essay [3] that the classic MRO is not so bad in practice, since one can typically avoids diamonds for classic classes. But all new style classes inherit from "object", therefore diamonds are unavoidable and inconsistencies shows up in every multiple inheritance graph. The MRO of Python 2.2 makes breaking monotonicity difficult, but not impossible. The following example, originally provided by Samuele Pedroni, shows that the MRO of Python 2.2 is non-monotonic: >>> class A(object): pass >>> class B(object): pass >>> class C(object): pass >>> class D(object): pass >>> class E(object): pass >>> class K1(A,B,C): pass >>> class K2(D,B,E): pass >>> class K3(D,A): pass >>> class Z(K1,K2,K3): pass Here are the linearizations according to the C3 MRO (the reader should verify these linearizations as an exercise and draw the inheritance diagram ;-) L[A] = A O L[B] = B O L[C] = C O L[D] = D O L[E] = E O L[K1]= K1 A B C O L[K2]= K2 D B E O L[K3]= K3 D A O L[Z] = Z K1 K2 K3 D A B C E O Python 2.2 gives exactly the same linearizations for A, B, C, D, E, K1, K2 and K3, but a different linearization for Z: L[Z,P22] = Z K1 K3 A K2 D B C E O It is clear that this linearization is *wrong*, since A comes before D whereas in the linearization of K3 A comes *after* D. In other words, in K3 methods derived by D override methods derived by A, but in Z, which still is a subclass of K3, methods derived by A override methods derived by D! This is a violation of monotonicity. Moreover, the Python 2.2 linearization of Z is also inconsistent with local precedence ordering, since the local precedence list of the class Z is [K1, K2, K3] (K2 precedes K3), whereas in the linearization of Z K2 *follows* K3. These problems explain why the 2.2 rule has been dismissed in favor of the C3 rule. The end ======= This section is for the impatient reader, who skipped all the previous sections and jumped immediately to the end. This section is for the lazy programmer too, who didn’t want to exercise her/his brain. Finally, it is for the programmer with some hubris, otherwise s/he would not be reading a paper on the C3 method resolution order in multiple inheritance hierarchies ;-) These three virtues taken all together (and *not* separately) deserve a prize: the prize is a short Python 2.2 script that allows you to compute the 2.3 MRO without risk to your brain. Simply change the last line to play with the various examples I have discussed in this paper.: # """C3 algorithm by Samuele Pedroni (with readability enhanced by me).""" class __metaclass__(type): "All classes are metamagically modified to be nicely printed" __repr__ = lambda cls: cls.__name__ class ex_2: "Serious order disagreement" #From Guido class O: pass class X(O): pass class Y(O): pass class A(X,Y): pass class B(Y,X): pass try: class Z(A,B): pass #creates Z(A,B) in Python 2.2 except TypeError: pass # Z(A,B) cannot be created in Python 2.3 class ex_5: "My first example" class O: pass class F(O): pass class E(O): pass class D(O): pass class C(D,F): pass class B(D,E): pass class A(B,C): pass class ex_6: "My second example" class O: pass class F(O): pass class E(O): pass class D(O): pass class C(D,F): pass class B(E,D): pass class A(B,C): pass class ex_9: "Difference between Python 2.2 MRO and C3" #From Samuele class O: pass class A(O): pass class B(O): pass class C(O): pass class D(O): pass class E(O): pass class K1(A,B,C): pass class K2(D,B,E): pass class K3(D,A): pass class Z(K1,K2,K3): pass def merge(seqs): print '\n\nCPL[%s]=%s' % (seqs[0][0],seqs), res = []; i=0 while 1: nonemptyseqs=[seq for seq in seqs if seq] if not nonemptyseqs: return res i+=1; print '\n',i,'round: candidates...', for seq in nonemptyseqs: # find merge candidates among seq heads cand = seq[0]; print ' ',cand, nothead=[s for s in nonemptyseqs if cand in s[1:]] if nothead: cand=None #reject candidate else: break if not cand: raise "Inconsistent hierarchy" res.append(cand) for seq in nonemptyseqs: # remove cand if seq[0] == cand: del seq[0] def mro(C): "Compute the class precedence list (mro) according to C3" return merge([[C]]+map(mro,C.__bases__)+[list(C.__bases__)]) def print_mro(C): print '\nMRO[%s]=%s' % (C,mro(C)) print '\nP22 MRO[%s]=%s' % (C,C.mro()) print_mro(ex_9.Z) # That’s all folks, enjoy ! Resources ========= [1] The thread on python-dev started by Samuele Pedroni: https://mail.python.org/pipermail/python- dev/2002-October/029035.html [2] The paper *A Monotonic Superclass Linearization for Dylan*: https://doi.org/10.1145/236337.236343 [3] Guido van Rossum’s essay, *Unifying types and classes in Python 2.2*: https://web.archive.org/web/20140210194412/http://www.pytho n.org/download/releases/2.2.2/descrintro Python support for the Linux "perf" profiler ******************************************** author: Pablo Galindo The Linux perf profiler is a very powerful tool that allows you to profile and obtain information about the performance of your application. "perf" also has a very vibrant ecosystem of tools that aid with the analysis of the data that it produces. The main problem with using the "perf" profiler with Python applications is that "perf" only gets information about native symbols, that is, the names of functions and procedures written in C. This means that the names and file names of Python functions in your code will not appear in the output of "perf". Since Python 3.12, the interpreter can run in a special mode that allows Python functions to appear in the output of the "perf" profiler. When this mode is enabled, the interpreter will interpose a small piece of code compiled on the fly before the execution of every Python function and it will teach "perf" the relationship between this piece of code and the associated Python function using perf map files. Note: Support for the "perf" profiler is currently only available for Linux on select architectures. Check the output of the "configure" build step or check the output of "python -m sysconfig | grep HAVE_PERF_TRAMPOLINE" to see if your system is supported. For example, consider the following script: def foo(n): result = 0 for _ in range(n): result += 1 return result def bar(n): foo(n) def baz(n): bar(n) if __name__ == "__main__": baz(1000000) We can run "perf" to sample CPU stack traces at 9999 hertz: $ perf record -F 9999 -g -o perf.data python my_script.py Then we can use "perf report" to analyze the data: $ perf report --stdio -n -g # Children Self Samples Command Shared Object Symbol # ........ ........ ............ .......... .................. .......................................... # 91.08% 0.00% 0 python.exe python.exe [.] _start | ---_start | --90.71%--__libc_start_main Py_BytesMain | |--56.88%--pymain_run_python.constprop.0 | | | |--56.13%--_PyRun_AnyFileObject | | _PyRun_SimpleFileObject | | | | | |--55.02%--run_mod | | | | | | | --54.65%--PyEval_EvalCode | | | _PyEval_EvalFrameDefault | | | PyObject_Vectorcall | | | _PyEval_Vector | | | _PyEval_EvalFrameDefault | | | PyObject_Vectorcall | | | _PyEval_Vector | | | _PyEval_EvalFrameDefault | | | PyObject_Vectorcall | | | _PyEval_Vector | | | | | | | |--51.67%--_PyEval_EvalFrameDefault | | | | | | | | | |--11.52%--_PyLong_Add | | | | | | | | | | | |--2.97%--_PyObject_Malloc ... As you can see, the Python functions are not shown in the output, only "_PyEval_EvalFrameDefault" (the function that evaluates the Python bytecode) shows up. Unfortunately that’s not very useful because all Python functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which bytecode- evaluating function. Instead, if we run the same experiment with "perf" support enabled we get: $ perf report --stdio -n -g # Children Self Samples Command Shared Object Symbol # ........ ........ ............ .......... .................. ..................................................................... # 90.58% 0.36% 1 python.exe python.exe [.] _start | ---_start | --89.86%--__libc_start_main Py_BytesMain | |--55.43%--pymain_run_python.constprop.0 | | | |--54.71%--_PyRun_AnyFileObject | | _PyRun_SimpleFileObject | | | | | |--53.62%--run_mod | | | | | | | --53.26%--PyEval_EvalCode | | | py:::/src/script.py | | | _PyEval_EvalFrameDefault | | | PyObject_Vectorcall | | | _PyEval_Vector | | | py::baz:/src/script.py | | | _PyEval_EvalFrameDefault | | | PyObject_Vectorcall | | | _PyEval_Vector | | | py::bar:/src/script.py | | | _PyEval_EvalFrameDefault | | | PyObject_Vectorcall | | | _PyEval_Vector | | | py::foo:/src/script.py | | | | | | | |--51.81%--_PyEval_EvalFrameDefault | | | | | | | | | |--13.77%--_PyLong_Add | | | | | | | | | | | |--3.26%--_PyObject_Malloc How to enable "perf" profiling support ====================================== "perf" profiling support can be enabled either from the start using the environment variable "PYTHONPERFSUPPORT" or the "-X perf" option, or dynamically using "sys.activate_stack_trampoline()" and "sys.deactivate_stack_trampoline()". The "sys" functions take precedence over the "-X" option, the "-X" option takes precedence over the environment variable. Example, using the environment variable: $ PYTHONPERFSUPPORT=1 perf record -F 9999 -g -o perf.data python my_script.py $ perf report -g -i perf.data Example, using the "-X" option: $ perf record -F 9999 -g -o perf.data python -X perf my_script.py $ perf report -g -i perf.data Example, using the "sys" APIs in file "example.py": import sys sys.activate_stack_trampoline("perf") do_profiled_stuff() sys.deactivate_stack_trampoline() non_profiled_stuff() …then: $ perf record -F 9999 -g -o perf.data python ./example.py $ perf report -g -i perf.data How to obtain the best results ============================== For best results, Python should be compiled with "CFLAGS="-fno-omit- frame-pointer -mno-omit-leaf-frame-pointer"" as this allows profilers to unwind using only the frame pointer and not on DWARF debug information. This is because as the code that is interposed to allow "perf" support is dynamically generated it doesn’t have any DWARF debugging information available. You can check if your system has been compiled with this flag by running: $ python -m sysconfig | grep 'no-omit-frame-pointer' If you don’t see any output it means that your interpreter has not been compiled with frame pointers and therefore it may not be able to show Python functions in the output of "perf". How to work without frame pointers ================================== If you are working with a Python interpreter that has been compiled without frame pointers, you can still use the "perf" profiler, but the overhead will be a bit higher because Python needs to generate unwinding information for every Python function call on the fly. Additionally, "perf" will take more time to process the data because it will need to use the DWARF debugging information to unwind the stack and this is a slow process. To enable this mode, you can use the environment variable "PYTHON_PERF_JIT_SUPPORT" or the "-X perf_jit" option, which will enable the JIT mode for the "perf" profiler. Note: Due to a bug in the "perf" tool, only "perf" versions higher than v6.8 will work with the JIT mode. The fix was also backported to the v6.7.2 version of the tool.Note that when checking the version of the "perf" tool (which can be done by running "perf version") you must take into account that some distros add some custom version numbers including a "-" character. This means that "perf 6.7-3" is not necessarily "perf 6.7.3". When using the perf JIT mode, you need an extra step before you can run "perf report". You need to call the "perf inject" command to inject the JIT information into the "perf.data" file.: $ perf record -F 9999 -g -k 1 --call-graph dwarf -o perf.data python -Xperf_jit my_script.py $ perf inject -i perf.data --jit --output perf.jit.data $ perf report -g -i perf.jit.data or using the environment variable: $ PYTHON_PERF_JIT_SUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py $ perf inject -i perf.data --jit --output perf.jit.data $ perf report -g -i perf.jit.data "perf inject --jit" command will read "perf.data", automatically pick up the perf dump file that Python creates (in "/tmp/perf-$PID.dump"), and then create "perf.jit.data" which merges all the JIT information together. It should also create a lot of "jitted-XXXX-N.so" files in the current directory which are ELF images for all the JIT trampolines that were created by Python. Warning: When using "--call-graph dwarf", the "perf" tool will take snapshots of the stack of the process being profiled and save the information in the "perf.data" file. By default, the size of the stack dump is 8192 bytes, but you can change the size by passing it after a comma like "--call-graph dwarf,16384".The size of the stack dump is important because if the size is too small "perf" will not be able to unwind the stack and the output will be incomplete. On the other hand, if the size is too big, then "perf" won’t be able to sample the process as frequently as it would like as the overhead will be higher.The stack size is particularly important when profiling Python code compiled with low optimization levels (like "-O0"), as these builds tend to have larger stack frames. If you are compiling Python with "-O0" and not seeing Python functions in your profiling output, try increasing the stack dump size to 65528 bytes (the maximum): $ perf record -F 9999 -g -k 1 --call-graph dwarf,65528 -o perf.data python -Xperf_jit my_script.py Different compilation flags can significantly impact stack sizes: * Builds with "-O0" typically have much larger stack frames than those with "-O1" or higher * Adding optimizations ("-O1", "-O2", etc.) typically reduces stack size * Frame pointers ("-fno-omit-frame-pointer") generally provide more reliable stack unwinding How to port Python 2 Code to Python 3 ************************************* author: Brett Cannon Python 2 reached its official end-of-life at the start of 2020. This means that no new bug reports, fixes, or changes will be made to Python 2 - it’s no longer supported: see **PEP 373** and status of Python versions. If you are looking to port an extension module instead of pure Python code, please see Porting Extension Modules to Python 3. The archived python-porting mailing list may contain some useful guidance. Since Python 3.11 the original porting guide was discontinued. You can find the old guide in the archive. Third-party guides ================== There are also multiple third-party guides that might be useful: * Guide by Fedora * PyCon 2020 tutorial * Guide by DigitalOcean * Guide by ActiveState Regular Expression HOWTO ************************ Author: A.M. Kuchling Abstract ^^^^^^^^ This document is an introductory tutorial to using regular expressions in Python with the "re" module. It provides a gentler introduction than the corresponding section in the Library Reference. Introduction ============ Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the "re" module. Using this little language, you specify the rules for the set of possible strings that you want to match; this set might contain English sentences, or e-mail addresses, or TeX commands, or anything you like. You can then ask questions such as “Does this string match the pattern?”, or “Is there a match for the pattern anywhere in this string?”. You can also use REs to modify a string or to split it apart in various ways. Regular expression patterns are compiled into a series of bytecodes which are then executed by a matching engine written in C. For advanced use, it may be necessary to pay careful attention to how the engine will execute a given RE, and write the RE in a certain way in order to produce bytecode that runs faster. Optimization isn’t covered in this document, because it requires that you have a good understanding of the matching engine’s internals. The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions. There are also tasks that *can* be done with regular expressions, but the expressions turn out to be very complicated. In these cases, you may be better off writing Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable. Simple Patterns =============== We’ll start by learning about the simplest possible regular expressions. Since regular expressions are used to operate on strings, we’ll begin with the most common task: matching characters. For a detailed explanation of the computer science underlying regular expressions (deterministic and non-deterministic finite automata), you can refer to almost any textbook on writing compilers. Matching Characters ------------------- Most letters and characters will simply match themselves. For example, the regular expression "test" will match the string "test" exactly. (You can enable a case-insensitive mode that would let this RE match "Test" or "TEST" as well; more about this later.) There are exceptions to this rule; some characters are special *metacharacters*, and don’t match themselves. Instead, they signal that some out-of-the-ordinary thing should be matched, or they affect other portions of the RE by repeating them or changing their meaning. Much of this document is devoted to discussing various metacharacters and what they do. Here’s a complete list of the metacharacters; their meanings will be discussed in the rest of this HOWTO. . ^ $ * + ? { } [ ] \ | ( ) The first metacharacters we’ll look at are "[" and "]". They’re used for specifying a character class, which is a set of characters that you wish to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a "'-'". For example, "[abc]" will match any of the characters "a", "b", or "c"; this is the same as "[a-c]", which uses a range to express the same set of characters. If you wanted to match only lowercase letters, your RE would be "[a-z]". Metacharacters (except "\") are not active inside classes. For example, "[akm$]" will match any of the characters "'a'", "'k'", "'m'", or "'$'"; "'$'" is usually a metacharacter, but inside a character class it’s stripped of its special nature. You can match the characters not listed within the class by *complementing* the set. This is indicated by including a "'^'" as the first character of the class. For example, "[^5]" will match any character except "'5'". If the caret appears elsewhere in a character class, it does not have special meaning. For example: "[5^]" will match either a "'5'" or a "'^'". Perhaps the most important metacharacter is the backslash, "\". As in Python string literals, the backslash can be followed by various characters to signal various special sequences. It’s also used to escape all the metacharacters so you can still match them in patterns; for example, if you need to match a "[" or "\", you can precede them with a backslash to remove their special meaning: "\[" or "\\". Some of the special sequences beginning with "'\'" represent predefined sets of characters that are often useful, such as the set of digits, the set of letters, or the set of anything that isn’t whitespace. Let’s take an example: "\w" matches any alphanumeric character. If the regex pattern is expressed in bytes, this is equivalent to the class "[a-zA-Z0-9_]". If the regex pattern is a string, "\w" will match all the characters marked as letters in the Unicode database provided by the "unicodedata" module. You can use the more restricted definition of "\w" in a string pattern by supplying the "re.ASCII" flag when compiling the regular expression. The following list of special sequences isn’t complete. For a complete list of sequences and expanded class definitions for Unicode string patterns, see the last part of Regular Expression Syntax in the Standard Library reference. In general, the Unicode versions match any character that’s in the appropriate category in the Unicode database. "\d" Matches any decimal digit; this is equivalent to the class "[0-9]". "\D" Matches any non-digit character; this is equivalent to the class "[^0-9]". "\s" Matches any whitespace character; this is equivalent to the class "[ \t\n\r\f\v]". "\S" Matches any non-whitespace character; this is equivalent to the class "[^ \t\n\r\f\v]". "\w" Matches any alphanumeric character; this is equivalent to the class "[a-zA-Z0-9_]". "\W" Matches any non-alphanumeric character; this is equivalent to the class "[^a-zA-Z0-9_]". These sequences can be included inside a character class. For example, "[\s,.]" is a character class that will match any whitespace character, or "','" or "'.'". The final metacharacter in this section is ".". It matches anything except a newline character, and there’s an alternate mode ("re.DOTALL") where it will match even a newline. "." is often used where you want to match “any character”. Repeating Things ---------------- Being able to match varying sets of characters is the first thing regular expressions can do that isn’t already possible with the methods available on strings. However, if that was the only additional capability of regexes, they wouldn’t be much of an advance. Another capability is that you can specify that portions of the RE must be repeated a certain number of times. The first metacharacter for repeating things that we’ll look at is "*". "*" doesn’t match the literal character "'*'"; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, "ca*t" will match "'ct'" (0 "'a'" characters), "'cat'" (1 "'a'"), "'caaat'" (3 "'a'" characters), and so forth. Repetitions such as "*" are *greedy*; when repeating a RE, the matching engine will try to repeat it as many times as possible. If later portions of the pattern don’t match, the matching engine will then back up and try again with fewer repetitions. A step-by-step example will make this more obvious. Let’s consider the expression "a[bcd]*b". This matches the letter "'a'", zero or more letters from the class "[bcd]", and finally ends with a "'b'". Now imagine matching this RE against the string "'abcbd'". +--------+-------------+-----------------------------------+ | Step | Matched | Explanation | |========|=============|===================================| | 1 | "a" | The "a" in the RE matches. | +--------+-------------+-----------------------------------+ | 2 | "abcbd" | The engine matches "[bcd]*", | | | | going as far as it can, which is | | | | to the end of the string. | +--------+-------------+-----------------------------------+ | 3 | *Failure* | The engine tries to match "b", | | | | but the current position is at | | | | the end of the string, so it | | | | fails. | +--------+-------------+-----------------------------------+ | 4 | "abcb" | Back up, so that "[bcd]*" | | | | matches one less character. | +--------+-------------+-----------------------------------+ | 5 | *Failure* | Try "b" again, but the current | | | | position is at the last | | | | character, which is a "'d'". | +--------+-------------+-----------------------------------+ | 6 | "abc" | Back up again, so that "[bcd]*" | | | | is only matching "bc". | +--------+-------------+-----------------------------------+ | 6 | "abcb" | Try "b" again. This time the | | | | character at the current position | | | | is "'b'", so it succeeds. | +--------+-------------+-----------------------------------+ The end of the RE has now been reached, and it has matched "'abcb'". This demonstrates how the matching engine goes as far as it can at first, and if no match is found it will then progressively back up and retry the rest of the RE again and again. It will back up until it has tried zero matches for "[bcd]*", and if that subsequently fails, the engine will conclude that the string doesn’t match the RE at all. Another repeating metacharacter is "+", which matches one or more times. Pay careful attention to the difference between "*" and "+"; "*" matches *zero* or more times, so whatever’s being repeated may not be present at all, while "+" requires at least *one* occurrence. To use a similar example, "ca+t" will match "'cat'" (1 "'a'"), "'caaat'" (3 "'a'"s), but won’t match "'ct'". There are two more repeating operators or quantifiers. The question mark character, "?", matches either once or zero times; you can think of it as marking something as being optional. For example, "home-?brew" matches either "'homebrew'" or "'home-brew'". The most complicated quantifier is "{m,n}", where *m* and *n* are decimal integers. This quantifier means there must be at least *m* repetitions, and at most *n*. For example, "a/{1,3}b" will match "'a/b'", "'a//b'", and "'a///b'". It won’t match "'ab'", which has no slashes, or "'a////b'", which has four. You can omit either *m* or *n*; in that case, a reasonable value is assumed for the missing value. Omitting *m* is interpreted as a lower limit of 0, while omitting *n* results in an upper bound of infinity. The simplest case "{m}" matches the preceding item exactly *m* times. For example, "a/{2}b" will only match "'a//b'". Readers of a reductionist bent may notice that the three other quantifiers can all be expressed using this notation. "{0,}" is the same as "*", "{1,}" is equivalent to "+", and "{0,1}" is the same as "?". It’s better to use "*", "+", or "?" when you can, simply because they’re shorter and easier to read. Using Regular Expressions ========================= Now that we’ve looked at some simple regular expressions, how do we actually use them in Python? The "re" module provides an interface to the regular expression engine, allowing you to compile REs into objects and then perform matches with them. Compiling Regular Expressions ----------------------------- Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. >>> import re >>> p = re.compile('ab*') >>> p re.compile('ab*') "re.compile()" also accepts an optional *flags* argument, used to enable various special features and syntax variations. We’ll go over the available settings later, but for now a single example will do: >>> p = re.compile('ab*', re.IGNORECASE) The RE is passed to "re.compile()" as a string. REs are handled as strings because regular expressions aren’t part of the core Python language, and no special syntax was created for expressing them. (There are applications that don’t need REs at all, so there’s no need to bloat the language specification by including them.) Instead, the "re" module is simply a C extension module included with Python, just like the "socket" or "zlib" modules. Putting REs in strings keeps the Python language simpler, but has one disadvantage which is the topic of the next section. The Backslash Plague -------------------- As stated earlier, regular expressions use the backslash character ("'\'") to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals. Let’s say you want to write a RE that matches the string "\section", which might be found in a LaTeX file. To figure out what to write in the program code, start with the desired string to be matched. Next, you must escape any backslashes and other metacharacters by preceding them with a backslash, resulting in the string "\\section". The resulting string that must be passed to "re.compile()" must be "\\section". However, to express this as a Python string literal, both backslashes must be escaped *again*. +---------------------+--------------------------------------------+ | Characters | Stage | |=====================|============================================| | "\section" | Text string to be matched | +---------------------+--------------------------------------------+ | "\\section" | Escaped backslash for "re.compile()" | +---------------------+--------------------------------------------+ | ""\\\\section"" | Escaped backslashes for a string literal | +---------------------+--------------------------------------------+ In short, to match a literal backslash, one has to write "'\\\\'" as the RE string, because the regular expression must be "\\", and each backslash must be expressed as "\\" inside a regular Python string literal. In REs that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand. The solution is to use Python’s raw string notation for regular expressions; backslashes are not handled in any special way in a string literal prefixed with "'r'", so "r"\n"" is a two-character string containing "'\'" and "'n'", while ""\n"" is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation. In addition, special escape sequences that are valid in regular expressions, but not valid as Python string literals, now result in a "DeprecationWarning" and will eventually become a "SyntaxError", which means the sequences will be invalid if raw string notation or escaping the backslashes isn’t used. +---------------------+--------------------+ | Regular String | Raw string | |=====================|====================| | ""ab*"" | "r"ab*"" | +---------------------+--------------------+ | ""\\\\section"" | "r"\\section"" | +---------------------+--------------------+ | ""\\w+\\s+\\1"" | "r"\w+\s+\1"" | +---------------------+--------------------+ Performing Matches ------------------ Once you have an object representing a compiled regular expression, what do you do with it? Pattern objects have several methods and attributes. Only the most significant ones will be covered here; consult the "re" docs for a complete listing. +--------------------+-------------------------------------------------+ | Method/Attribute | Purpose | |====================|=================================================| | "match()" | Determine if the RE matches at the beginning of | | | the string. | +--------------------+-------------------------------------------------+ | "search()" | Scan through a string, looking for any location | | | where this RE matches. | +--------------------+-------------------------------------------------+ | "findall()" | Find all substrings where the RE matches, and | | | returns them as a list. | +--------------------+-------------------------------------------------+ | "finditer()" | Find all substrings where the RE matches, and | | | returns them as an *iterator*. | +--------------------+-------------------------------------------------+ "match()" and "search()" return "None" if no match can be found. If they’re successful, a match object instance is returned, containing information about the match: where it starts and ends, the substring it matched, and more. You can learn about this by interactively experimenting with the "re" module. This HOWTO uses the standard Python interpreter for its examples. First, run the Python interpreter, import the "re" module, and compile a RE: >>> import re >>> p = re.compile('[a-z]+') >>> p re.compile('[a-z]+') Now, you can try matching various strings against the RE "[a-z]+". An empty string shouldn’t match at all, since "+" means ‘one or more repetitions’. "match()" should return "None" in this case, which will cause the interpreter to print no output. You can explicitly print the result of "match()" to make this clear. >>> p.match("") >>> print(p.match("")) None Now, let’s try it on a string that it should match, such as "tempo". In this case, "match()" will return a match object, so you should store the result in a variable for later use. >>> m = p.match('tempo') >>> m Now you can query the match object for information about the matching string. Match object instances also have several methods and attributes; the most important ones are: +--------------------+----------------------------------------------+ | Method/Attribute | Purpose | |====================|==============================================| | "group()" | Return the string matched by the RE | +--------------------+----------------------------------------------+ | "start()" | Return the starting position of the match | +--------------------+----------------------------------------------+ | "end()" | Return the ending position of the match | +--------------------+----------------------------------------------+ | "span()" | Return a tuple containing the (start, end) | | | positions of the match | +--------------------+----------------------------------------------+ Trying these methods will soon clarify their meaning: >>> m.group() 'tempo' >>> m.start(), m.end() (0, 5) >>> m.span() (0, 5) "group()" returns the substring that was matched by the RE. "start()" and "end()" return the starting and ending index of the match. "span()" returns both start and end indexes in a single tuple. Since the "match()" method only checks if the RE matches at the start of a string, "start()" will always be zero. However, the "search()" method of patterns scans through the string, so the match may not start at zero in that case. >>> print(p.match('::: message')) None >>> m = p.search('::: message'); print(m) >>> m.group() 'message' >>> m.span() (4, 11) In actual programs, the most common style is to store the match object in a variable, and then check if it was "None". This usually looks like: p = re.compile( ... ) m = p.match( 'string goes here' ) if m: print('Match found: ', m.group()) else: print('No match') Two pattern methods return all of the matches for a pattern. "findall()" returns a list of matching strings: >>> p = re.compile(r'\d+') >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') ['12', '11', '10'] The "r" prefix, making the literal a raw string literal, is needed in this example because escape sequences in a normal “cooked” string literal that are not recognized by Python, as opposed to regular expressions, now result in a "DeprecationWarning" and will eventually become a "SyntaxError". See The Backslash Plague. "findall()" has to create the entire list before it can be returned as the result. The "finditer()" method returns a sequence of match object instances as an *iterator*: >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') >>> iterator >>> for match in iterator: ... print(match.span()) ... (0, 2) (22, 24) (29, 31) Module-Level Functions ---------------------- You don’t have to create a pattern object and call its methods; the "re" module also provides top-level functions called "match()", "search()", "findall()", "sub()", and so forth. These functions take the same arguments as the corresponding pattern method with the RE string added as the first argument, and still return either "None" or a match object instance. >>> print(re.match(r'From\s+', 'Fromage amk')) None >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') Under the hood, these functions simply create a pattern object for you and call the appropriate method on it. They also store the compiled object in a cache, so future calls using the same RE won’t need to parse the pattern again and again. Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re accessing a regex within a loop, pre-compiling it will save a few function calls. Outside of loops, there’s not much difference thanks to the internal cache. Compilation Flags ----------------- Compilation flags let you modify some aspects of how regular expressions work. Flags are available in the "re" module under two names, a long name such as "IGNORECASE" and a short, one-letter form such as "I". (If you’re familiar with Perl’s pattern modifiers, the one-letter forms use the same letters; the short form of "re.VERBOSE" is "re.X", for example.) Multiple flags can be specified by bitwise OR-ing them; "re.I | re.M" sets both the "I" and "M" flags, for example. Here’s a table of the available flags, followed by a more detailed explanation of each one. +-----------------------------------+----------------------------------------------+ | Flag | Meaning | |===================================|==============================================| | "ASCII", "A" | Makes several escapes like "\w", "\b", "\s" | | | and "\d" match only on ASCII characters with | | | the respective property. | +-----------------------------------+----------------------------------------------+ | "DOTALL", "S" | Make "." match any character, including | | | newlines. | +-----------------------------------+----------------------------------------------+ | "IGNORECASE", "I" | Do case-insensitive matches. | +-----------------------------------+----------------------------------------------+ | "LOCALE", "L" | Do a locale-aware match. | +-----------------------------------+----------------------------------------------+ | "MULTILINE", "M" | Multi-line matching, affecting "^" and "$". | +-----------------------------------+----------------------------------------------+ | "VERBOSE", "X" (for ‘extended’) | Enable verbose REs, which can be organized | | | more cleanly and understandably. | +-----------------------------------+----------------------------------------------+ re.I re.IGNORECASE Perform case-insensitive matching; character class and literal strings will match letters by ignoring case. For example, "[A-Z]" will match lowercase letters, too. Full Unicode matching also works unless the "ASCII" flag is used to disable non-ASCII matches. When the Unicode patterns "[a-z]" or "[A-Z]" are used in combination with the "IGNORECASE" flag, they will match the 52 ASCII letters and 4 additional non-ASCII letters: ‘İ’ (U+0130, Latin capital letter I with dot above), ‘ı’ (U+0131, Latin small letter dotless i), ‘ſ’ (U+017F, Latin small letter long s) and ‘K’ (U+212A, Kelvin sign). "Spam" will match "'Spam'", "'spam'", "'spAM'", or "'ſpam'" (the latter is matched only in Unicode mode). This lowercasing doesn’t take the current locale into account; it will if you also set the "LOCALE" flag. re.L re.LOCALE Make "\w", "\W", "\b", "\B" and case-insensitive matching dependent on the current locale instead of the Unicode database. Locales are a feature of the C library intended to help in writing programs that take account of language differences. For example, if you’re processing encoded French text, you’d want to be able to write "\w+" to match words, but "\w" only matches the character class "[A-Za-z]" in bytes patterns; it won’t match bytes corresponding to "é" or "ç". If your system is configured properly and a French locale is selected, certain C functions will tell the program that the byte corresponding to "é" should also be considered a letter. Setting the "LOCALE" flag when compiling a regular expression will cause the resulting compiled object to use these C functions for "\w"; this is slower, but also enables "\w+" to match French words as you’d expect. The use of this flag is discouraged in Python 3 as the locale mechanism is very unreliable, it only handles one “culture” at a time, and it only works with 8-bit locales. Unicode matching is already enabled by default in Python 3 for Unicode (str) patterns, and it is able to handle different locales/languages. re.M re.MULTILINE ("^" and "$" haven’t been explained yet; they’ll be introduced in section More Metacharacters.) Usually "^" matches only at the beginning of the string, and "$" matches only at the end of the string and immediately before the newline (if any) at the end of the string. When this flag is specified, "^" matches at the beginning of the string and at the beginning of each line within the string, immediately following each newline. Similarly, the "$" metacharacter matches either at the end of the string and at the end of each line (immediately preceding each newline). re.S re.DOTALL Makes the "'.'" special character match any character at all, including a newline; without this flag, "'.'" will match anything *except* a newline. re.A re.ASCII Make "\w", "\W", "\b", "\B", "\s" and "\S" perform ASCII-only matching instead of full Unicode matching. This is only meaningful for Unicode patterns, and is ignored for byte patterns. re.X re.VERBOSE This flag allows you to write regular expressions that are more readable by granting you more flexibility in how you can format them. When this flag has been specified, whitespace within the RE string is ignored, except when the whitespace is in a character class or preceded by an unescaped backslash; this lets you organize and indent the RE more clearly. This flag also lets you put comments within a RE that will be ignored by the engine; comments are marked by a "'#'" that’s neither in a character class or preceded by an unescaped backslash. For example, here’s a RE that uses "re.VERBOSE"; see how much easier it is to read? charref = re.compile(r""" &[#] # Start of a numeric entity reference ( 0[0-7]+ # Octal form | [0-9]+ # Decimal form | x[0-9a-fA-F]+ # Hexadecimal form ) ; # Trailing semicolon """, re.VERBOSE) Without the verbose setting, the RE would look like this: charref = re.compile("&#(0[0-7]+" "|[0-9]+" "|x[0-9a-fA-F]+);") In the above example, Python’s automatic concatenation of string literals has been used to break up the RE into smaller pieces, but it’s still more difficult to understand than the version using "re.VERBOSE". More Pattern Power ================== So far we’ve only covered a part of the features of regular expressions. In this section, we’ll cover some new metacharacters, and how to use groups to retrieve portions of the text that was matched. More Metacharacters ------------------- There are some metacharacters that we haven’t covered yet. Most of them will be covered in this section. Some of the remaining metacharacters to be discussed are *zero-width assertions*. They don’t cause the engine to advance through the string; instead, they consume no characters at all, and simply succeed or fail. For example, "\b" is an assertion that the current position is located at a word boundary; the position isn’t changed by the "\b" at all. This means that zero-width assertions should never be repeated, because if they match once at a given location, they can obviously be matched an infinite number of times. "|" Alternation, or the “or” operator. If *A* and *B* are regular expressions, "A|B" will match any string that matches either *A* or *B*. "|" has very low precedence in order to make it work reasonably when you’re alternating multi-character strings. "Crow|Servo" will match either "'Crow'" or "'Servo'", not "'Cro'", a "'w'" or an "'S'", and "'ervo'". To match a literal "'|'", use "\|", or enclose it inside a character class, as in "[|]". "^" Matches at the beginning of lines. Unless the "MULTILINE" flag has been set, this will only match at the beginning of the string. In "MULTILINE" mode, this also matches immediately after each newline within the string. For example, if you wish to match the word "From" only at the beginning of a line, the RE to use is "^From". >>> print(re.search('^From', 'From Here to Eternity')) >>> print(re.search('^From', 'Reciting From Memory')) None To match a literal "'^'", use "\^". "$" Matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character. >>> print(re.search('}$', '{block}')) >>> print(re.search('}$', '{block} ')) None >>> print(re.search('}$', '{block}\n')) To match a literal "'$'", use "\$" or enclose it inside a character class, as in "[$]". "\A" Matches only at the start of the string. When not in "MULTILINE" mode, "\A" and "^" are effectively the same. In "MULTILINE" mode, they’re different: "\A" still matches only at the beginning of the string, but "^" may match at any location inside the string that follows a newline character. "\Z" Matches only at the end of the string. "\b" Word boundary. This is a zero-width assertion that matches only at the beginning or end of a word. A word is defined as a sequence of alphanumeric characters, so the end of a word is indicated by whitespace or a non-alphanumeric character. The following example matches "class" only when it’s a complete word; it won’t match when it’s contained inside another word. >>> p = re.compile(r'\bclass\b') >>> print(p.search('no class at all')) >>> print(p.search('the declassified algorithm')) None >>> print(p.search('one subclass is')) None There are two subtleties you should remember when using this special sequence. First, this is the worst collision between Python’s string literals and regular expression sequences. In Python’s string literals, "\b" is the backspace character, ASCII value 8. If you’re not using raw strings, then Python will convert the "\b" to a backspace, and your RE won’t match as you expect it to. The following example looks the same as our previous RE, but omits the "'r'" in front of the RE string. >>> p = re.compile('\bclass\b') >>> print(p.search('no class at all')) None >>> print(p.search('\b' + 'class' + '\b')) Second, inside a character class, where there’s no use for this assertion, "\b" represents the backspace character, for compatibility with Python’s string literals. "\B" Another zero-width assertion, this is the opposite of "\b", only matching when the current position is not at a word boundary. Grouping -------- Frequently you need to obtain more information than just whether the RE matched or not. Regular expressions are often used to dissect strings by writing a RE divided into several subgroups which match different components of interest. For example, an RFC-822 header line is divided into a header name and a value, separated by a "':'", like this: From: author@example.com User-Agent: Thunderbird 1.5.0.9 (X11/20061227) MIME-Version: 1.0 To: editor@example.com This can be handled by writing a regular expression which matches an entire header line, and has one group which matches the header name, and another group which matches the header’s value. Groups are marked by the "'('", "')'" metacharacters. "'('" and "')'" have much the same meaning as they do in mathematical expressions; they group together the expressions contained inside them, and you can repeat the contents of a group with a quantifier, such as "*", "+", "?", or "{m,n}". For example, "(ab)*" will match zero or more repetitions of "ab". >>> p = re.compile('(ab)*') >>> print(p.match('ababababab').span()) (0, 10) Groups indicated with "'('", "')'" also capture the starting and ending index of the text that they match; this can be retrieved by passing an argument to "group()", "start()", "end()", and "span()". Groups are numbered starting with 0. Group 0 is always present; it’s the whole RE, so match object methods all have group 0 as their default argument. Later we’ll see how to express groups that don’t capture the span of text that they match. >>> p = re.compile('(a)b') >>> m = p.match('ab') >>> m.group() 'ab' >>> m.group(0) 'ab' Subgroups are numbered from left to right, from 1 upward. Groups can be nested; to determine the number, just count the opening parenthesis characters, going from left to right. >>> p = re.compile('(a(b)c)d') >>> m = p.match('abcd') >>> m.group(0) 'abcd' >>> m.group(1) 'abc' >>> m.group(2) 'b' "group()" can be passed multiple group numbers at a time, in which case it will return a tuple containing the corresponding values for those groups. >>> m.group(2,1,2) ('b', 'abc', 'b') The "groups()" method returns a tuple containing the strings for all the subgroups, from 1 up to however many there are. >>> m.groups() ('abc', 'b') Backreferences in a pattern allow you to specify that the contents of an earlier capturing group must also be found at the current location in the string. For example, "\1" will succeed if the exact contents of group 1 can be found at the current position, and fails otherwise. Remember that Python’s string literals also use a backslash followed by numbers to allow including arbitrary characters in a string, so be sure to use a raw string when incorporating backreferences in a RE. For example, the following RE detects doubled words in a string. >>> p = re.compile(r'\b(\w+)\s+\1\b') >>> p.search('Paris in the the spring').group() 'the the' Backreferences like this aren’t often useful for just searching through a string — there are few text formats which repeat data in this way — but you’ll soon find out that they’re *very* useful when performing string substitutions. Non-capturing and Named Groups ------------------------------ Elaborate REs may use many groups, both to capture substrings of interest, and to group and structure the RE itself. In complex REs, it becomes difficult to keep track of the group numbers. There are two features which help with this problem. Both of them use a common syntax for regular expression extensions, so we’ll look at that first. Perl 5 is well known for its powerful additions to standard regular expressions. For these new features the Perl developers couldn’t choose new single-keystroke metacharacters or new special sequences beginning with "\" without making Perl’s regular expressions confusingly different from standard REs. If they chose "&" as a new metacharacter, for example, old expressions would be assuming that "&" was a regular character and wouldn’t have escaped it by writing "\&" or "[&]". The solution chosen by the Perl developers was to use "(?...)" as the extension syntax. "?" immediately after a parenthesis was a syntax error because the "?" would have nothing to repeat, so this didn’t introduce any compatibility problems. The characters immediately after the "?" indicate what extension is being used, so "(?=foo)" is one thing (a positive lookahead assertion) and "(?:foo)" is something else (a non-capturing group containing the subexpression "foo"). Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax. If the first character after the question mark is a "P", you know that it’s an extension that’s specific to Python. Now that we’ve looked at the general extension syntax, we can return to the features that simplify working with groups in complex REs. Sometimes you’ll want to use a group to denote a part of a regular expression, but aren’t interested in retrieving the group’s contents. You can make this fact explicit by using a non-capturing group: "(?:...)", where you can replace the "..." with any other regular expression. >>> m = re.match("([abc])+", "abc") >>> m.groups() ('c',) >>> m = re.match("(?:[abc])+", "abc") >>> m.groups() () Except for the fact that you can’t retrieve the contents of what the group matched, a non-capturing group behaves exactly the same as a capturing group; you can put anything inside it, repeat it with a repetition metacharacter such as "*", and nest it within other groups (capturing or non-capturing). "(?:...)" is particularly useful when modifying an existing pattern, since you can add new groups without changing how all the other groups are numbered. It should be mentioned that there’s no performance difference in searching between capturing and non-capturing groups; neither form is any faster than the other. A more significant feature is named groups: instead of referring to them by numbers, groups can be referenced by a name. The syntax for a named group is one of the Python-specific extensions: "(?P...)". *name* is, obviously, the name of the group. Named groups behave exactly like capturing groups, and additionally associate a name with a group. The match object methods that deal with capturing groups all accept either integers that refer to the group by number or strings that contain the desired group’s name. Named groups are still given numbers, so you can retrieve information about a group in two ways: >>> p = re.compile(r'(?P\b\w+\b)') >>> m = p.search( '(((( Lots of punctuation )))' ) >>> m.group('word') 'Lots' >>> m.group(1) 'Lots' Additionally, you can retrieve named groups as a dictionary with "groupdict()": >>> m = re.match(r'(?P\w+) (?P\w+)', 'Jane Doe') >>> m.groupdict() {'first': 'Jane', 'last': 'Doe'} Named groups are handy because they let you use easily remembered names, instead of having to remember numbers. Here’s an example RE from the "imaplib" module: InternalDate = re.compile(r'INTERNALDATE "' r'(?P[ 123][0-9])-(?P[A-Z][a-z][a-z])-' r'(?P[0-9][0-9][0-9][0-9])' r' (?P[0-9][0-9]):(?P[0-9][0-9]):(?P[0-9][0-9])' r' (?P[-+])(?P[0-9][0-9])(?P[0-9][0-9])' r'"') It’s obviously much easier to retrieve "m.group('zonem')", instead of having to remember to retrieve group 9. The syntax for backreferences in an expression such as "(...)\1" refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension: "(?P=name)" indicates that the contents of the group called *name* should again be matched at the current point. The regular expression for finding doubled words, "\b(\w+)\s+\1\b" can also be written as "\b(?P\w+)\s+(?P=word)\b": >>> p = re.compile(r'\b(?P\w+)\s+(?P=word)\b') >>> p.search('Paris in the the spring').group() 'the the' Lookahead Assertions -------------------- Another zero-width assertion is the lookahead assertion. Lookahead assertions are available in both positive and negative form, and look like this: "(?=...)" Positive lookahead assertion. This succeeds if the contained regular expression, represented here by "...", successfully matches at the current location, and fails otherwise. But, once the contained expression has been tried, the matching engine doesn’t advance at all; the rest of the pattern is tried right where the assertion started. "(?!...)" Negative lookahead assertion. This is the opposite of the positive assertion; it succeeds if the contained expression *doesn’t* match at the current position in the string. To make this concrete, let’s look at a case where a lookahead is useful. Consider a simple pattern to match a filename and split it apart into a base name and an extension, separated by a ".". For example, in "news.rc", "news" is the base name, and "rc" is the filename’s extension. The pattern to match this is quite simple: ".*[.].*$" Notice that the "." needs to be treated specially because it’s a metacharacter, so it’s inside a character class to only match that specific character. Also notice the trailing "$"; this is added to ensure that all the rest of the string must be included in the extension. This regular expression matches "foo.bar" and "autoexec.bat" and "sendmail.cf" and "printers.conf". Now, consider complicating the problem a bit; what if you want to match filenames where the extension is not "bat"? Some incorrect attempts: ".*[.][^b].*$" The first attempt above tries to exclude "bat" by requiring that the first character of the extension is not a "b". This is wrong, because the pattern also doesn’t match "foo.bar". ".*[.]([^b]..|.[^a].|..[^t])$" The expression gets messier when you try to patch up the first solution by requiring one of the following cases to match: the first character of the extension isn’t "b"; the second character isn’t "a"; or the third character isn’t "t". This accepts "foo.bar" and rejects "autoexec.bat", but it requires a three-letter extension and won’t accept a filename with a two-letter extension such as "sendmail.cf". We’ll complicate the pattern again in an effort to fix it. ".*[.]([^b].?.?|.[^a]?.?|..?[^t]?)$" In the third attempt, the second and third letters are all made optional in order to allow matching extensions shorter than three characters, such as "sendmail.cf". The pattern’s getting really complicated now, which makes it hard to read and understand. Worse, if the problem changes and you want to exclude both "bat" and "exe" as extensions, the pattern would get even more complicated and confusing. A negative lookahead cuts through all this confusion: ".*[.](?!bat$)[^.]*$" The negative lookahead means: if the expression "bat" doesn’t match at this point, try the rest of the pattern; if "bat$" does match, the whole pattern will fail. The trailing "$" is required to ensure that something like "sample.batch", where the extension only starts with "bat", will be allowed. The "[^.]*" makes sure that the pattern works when there are multiple dots in the filename. Excluding another filename extension is now easy; simply add it as an alternative inside the assertion. The following pattern excludes filenames that end in either "bat" or "exe": ".*[.](?!bat$|exe$)[^.]*$" Modifying Strings ================= Up to this point, we’ve simply performed searches against a static string. Regular expressions are also commonly used to modify strings in various ways, using the following pattern methods: +--------------------+-------------------------------------------------+ | Method/Attribute | Purpose | |====================|=================================================| | "split()" | Split the string into a list, splitting it | | | wherever the RE matches | +--------------------+-------------------------------------------------+ | "sub()" | Find all substrings where the RE matches, and | | | replace them with a different string | +--------------------+-------------------------------------------------+ | "subn()" | Does the same thing as "sub()", but returns | | | the new string and the number of replacements | +--------------------+-------------------------------------------------+ Splitting Strings ----------------- The "split()" method of a pattern splits a string apart wherever the RE matches, returning a list of the pieces. It’s similar to the "split()" method of strings but provides much more generality in the delimiters that you can split by; string "split()" only supports splitting by whitespace or by a fixed string. As you’d expect, there’s a module-level "re.split()" function, too. .split(string[, maxsplit=0]) Split *string* by the matches of the regular expression. If capturing parentheses are used in the RE, then their contents will also be returned as part of the resulting list. If *maxsplit* is nonzero, at most *maxsplit* splits are performed. You can limit the number of splits made, by passing a value for *maxsplit*. When *maxsplit* is nonzero, at most *maxsplit* splits will be made, and the remainder of the string is returned as the final element of the list. In the following example, the delimiter is any sequence of non-alphanumeric characters. >>> p = re.compile(r'\W+') >>> p.split('This is a test, short and sweet, of split().') ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', ''] >>> p.split('This is a test, short and sweet, of split().', 3) ['This', 'is', 'a', 'test, short and sweet, of split().'] Sometimes you’re not only interested in what the text between delimiters is, but also need to know what the delimiter was. If capturing parentheses are used in the RE, then their values are also returned as part of the list. Compare the following calls: >>> p = re.compile(r'\W+') >>> p2 = re.compile(r'(\W+)') >>> p.split('This... is a test.') ['This', 'is', 'a', 'test', ''] >>> p2.split('This... is a test.') ['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', ''] The module-level function "re.split()" adds the RE to be used as the first argument, but is otherwise the same. >>> re.split(r'[\W]+', 'Words, words, words.') ['Words', 'words', 'words', ''] >>> re.split(r'([\W]+)', 'Words, words, words.') ['Words', ', ', 'words', ', ', 'words', '.', ''] >>> re.split(r'[\W]+', 'Words, words, words.', 1) ['Words', 'words, words.'] Search and Replace ------------------ Another common task is to find all the matches for a pattern, and replace them with a different string. The "sub()" method takes a replacement value, which can be either a string or a function, and the string to be processed. .sub(replacement, string[, count=0]) Returns the string obtained by replacing the leftmost non- overlapping occurrences of the RE in *string* by the replacement *replacement*. If the pattern isn’t found, *string* is returned unchanged. The optional argument *count* is the maximum number of pattern occurrences to be replaced; *count* must be a non-negative integer. The default value of 0 means to replace all occurrences. Here’s a simple example of using the "sub()" method. It replaces colour names with the word "colour": >>> p = re.compile('(blue|white|red)') >>> p.sub('colour', 'blue socks and red shoes') 'colour socks and colour shoes' >>> p.sub('colour', 'blue socks and red shoes', count=1) 'colour socks and red shoes' The "subn()" method does the same work, but returns a 2-tuple containing the new string value and the number of replacements that were performed: >>> p = re.compile('(blue|white|red)') >>> p.subn('colour', 'blue socks and red shoes') ('colour socks and colour shoes', 2) >>> p.subn('colour', 'no colours at all') ('no colours at all', 0) Empty matches are replaced only when they’re not adjacent to a previous empty match. >>> p = re.compile('x*') >>> p.sub('-', 'abxd') '-a-b--d-' If *replacement* is a string, any backslash escapes in it are processed. That is, "\n" is converted to a single newline character, "\r" is converted to a carriage return, and so forth. Unknown escapes such as "\&" are left alone. Backreferences, such as "\6", are replaced with the substring matched by the corresponding group in the RE. This lets you incorporate portions of the original text in the resulting replacement string. This example matches the word "section" followed by a string enclosed in "{", "}", and changes "section" to "subsection": >>> p = re.compile('section{ ( [^}]* ) }', re.VERBOSE) >>> p.sub(r'subsection{\1}','section{First} section{second}') 'subsection{First} subsection{second}' There’s also a syntax for referring to named groups as defined by the "(?P...)" syntax. "\g" will use the substring matched by the group named "name", and "\g" uses the corresponding group number. "\g<2>" is therefore equivalent to "\2", but isn’t ambiguous in a replacement string such as "\g<2>0". ("\20" would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character "'0'".) The following substitutions are all equivalent, but use all three variations of the replacement string. >>> p = re.compile('section{ (?P [^}]* ) }', re.VERBOSE) >>> p.sub(r'subsection{\1}','section{First}') 'subsection{First}' >>> p.sub(r'subsection{\g<1>}','section{First}') 'subsection{First}' >>> p.sub(r'subsection{\g}','section{First}') 'subsection{First}' *replacement* can also be a function, which gives you even more control. If *replacement* is a function, the function is called for every non-overlapping occurrence of *pattern*. On each call, the function is passed a match object argument for the match and can use this information to compute the desired replacement string and return it. In the following example, the replacement function translates decimals into hexadecimal: >>> def hexrepl(match): ... "Return the hex string for a decimal number" ... value = int(match.group()) ... return hex(value) ... >>> p = re.compile(r'\d+') >>> p.sub(hexrepl, 'Call 65490 for printing, 49152 for user code.') 'Call 0xffd2 for printing, 0xc000 for user code.' When using the module-level "re.sub()" function, the pattern is passed as the first argument. The pattern may be provided as an object or as a string; if you need to specify regular expression flags, you must either use a pattern object as the first parameter, or use embedded modifiers in the pattern string, e.g. "sub("(?i)b+", "x", "bbbb BBBB")" returns "'x x'". Common Problems =============== Regular expressions are a powerful tool for some applications, but in some ways their behaviour isn’t intuitive and at times they don’t behave the way you may expect them to. This section will point out some of the most common pitfalls. Use String Methods ------------------ Sometimes using the "re" module is a mistake. If you’re matching a fixed string, or a single character class, and you’re not using any "re" features such as the "IGNORECASE" flag, then the full power of regular expressions may not be required. Strings have several methods for performing operations with fixed strings and they’re usually much faster, because the implementation is a single small C loop that’s been optimized for the purpose, instead of the large, more generalized regular expression engine. One example might be replacing a single fixed string with another one; for example, you might replace "word" with "deed". "re.sub()" seems like the function to use for this, but consider the "replace()" method. Note that "replace()" will also replace "word" inside words, turning "swordfish" into "sdeedfish", but the naive RE "word" would have done that, too. (To avoid performing the substitution on parts of words, the pattern would have to be "\bword\b", in order to require that "word" have a word boundary on either side. This takes the job beyond "replace()"’s abilities.) Another common task is deleting every occurrence of a single character from a string or replacing it with another single character. You might do this with something like "re.sub('\n', ' ', S)", but "translate()" is capable of doing both tasks and will be faster than any regular expression operation can be. In short, before turning to the "re" module, consider whether your problem can be solved with a faster and simpler string method. match() versus search() ----------------------- The "match()" function only checks if the RE matches at the beginning of the string while "search()" will scan forward through the string for a match. It’s important to keep this distinction in mind. Remember, "match()" will only report a successful match which will start at 0; if the match wouldn’t start at zero, "match()" will *not* report it. >>> print(re.match('super', 'superstition').span()) (0, 5) >>> print(re.match('super', 'insuperable')) None On the other hand, "search()" will scan forward through the string, reporting the first match it finds. >>> print(re.search('super', 'superstition').span()) (0, 5) >>> print(re.search('super', 'insuperable').span()) (2, 7) Sometimes you’ll be tempted to keep using "re.match()", and just add ".*" to the front of your RE. Resist this temptation and use "re.search()" instead. The regular expression compiler does some analysis of REs in order to speed up the process of looking for a match. One such analysis figures out what the first character of a match must be; for example, a pattern starting with "Crow" must match starting with a "'C'". The analysis lets the engine quickly scan through the string looking for the starting character, only trying the full match if a "'C'" is found. Adding ".*" defeats this optimization, requiring scanning to the end of the string and then backtracking to find a match for the rest of the RE. Use "re.search()" instead. Greedy versus Non-Greedy ------------------------ When repeating a regular expression, as in "a*", the resulting action is to consume as much of the pattern as possible. This fact often bites you when you’re trying to match a pair of balanced delimiters, such as the angle brackets surrounding an HTML tag. The naive pattern for matching a single HTML tag doesn’t work because of the greedy nature of ".*". >>> s = 'Title' >>> len(s) 32 >>> print(re.match('<.*>', s).span()) (0, 32) >>> print(re.match('<.*>', s).group()) Title The RE matches the "'<'" in "''", and the ".*" consumes the rest of the string. There’s still more left in the RE, though, and the ">" can’t match at the end of the string, so the regular expression engine has to backtrack character by character until it finds a match for the ">". The final match extends from the "'<'" in "''" to the "'>'" in "''", which isn’t what you want. In this case, the solution is to use the non-greedy quantifiers "*?", "+?", "??", or "{m,n}?", which match as *little* text as possible. In the above example, the "'>'" is tried immediately after the first "'<'" matches, and when it fails, the engine advances a character at a time, retrying the "'>'" at every step. This produces just the right result: >>> print(re.match('<.*?>', s).group()) (Note that parsing HTML or XML with regular expressions is painful. Quick-and-dirty patterns will handle common cases, but HTML and XML have special cases that will break the obvious regular expression; by the time you’ve written a regular expression that handles all of the possible cases, the patterns will be *very* complicated. Use an HTML or XML parser module for such tasks.) Using re.VERBOSE ---------------- By now you’ve probably noticed that regular expressions are a very compact notation, but they’re not terribly readable. REs of moderate complexity can become lengthy collections of backslashes, parentheses, and metacharacters, making them difficult to read and understand. For such REs, specifying the "re.VERBOSE" flag when compiling the regular expression can be helpful, because it allows you to format the regular expression more clearly. The "re.VERBOSE" flag has several effects. Whitespace in the regular expression that *isn’t* inside a character class is ignored. This means that an expression such as "dog | cat" is equivalent to the less readable "dog|cat", but "[a b]" will still match the characters "'a'", "'b'", or a space. In addition, you can also put comments inside a RE; comments extend from a "#" character to the next newline. When used with triple-quoted strings, this enables REs to be formatted more neatly: pat = re.compile(r""" \s* # Skip leading whitespace (?P
[^:]+) # Header name \s* : # Whitespace, and a colon (?P.*?) # The header's value -- *? used to # lose the following trailing whitespace \s*$ # Trailing whitespace to end-of-line """, re.VERBOSE) This is far more readable than: pat = re.compile(r"\s*(?P
[^:]+)\s*:(?P.*?)\s*$") Feedback ======== Regular expressions are a complicated topic. Did this document help you understand them? Were there parts that were unclear, or Problems you encountered that weren’t covered here? If so, please send suggestions for improvements to the author. The most complete book on regular expressions is almost certainly Jeffrey Friedl’s Mastering Regular Expressions, published by O’Reilly. Unfortunately, it exclusively concentrates on Perl and Java’s flavours of regular expressions, and doesn’t contain any Python material at all, so it won’t be useful as a reference for programming in Python. (The first edition covered Python’s now-removed "regex" module, which won’t help you much.) Consider checking it out from your library. Socket Programming HOWTO ************************ Author: Gordon McMillan Abstract ^^^^^^^^ Sockets are used nearly everywhere, but are one of the most severely misunderstood technologies around. This is a 10,000 foot overview of sockets. It’s not really a tutorial - you’ll still have work to do in getting things operational. It doesn’t cover the fine points (and there are a lot of them), but I hope it will give you enough background to begin using them decently. Sockets ======= I’m only going to talk about INET (i.e. IPv4) sockets, but they account for at least 99% of the sockets in use. And I’ll only talk about STREAM (i.e. TCP) sockets - unless you really know what you’re doing (in which case this HOWTO isn’t for you!), you’ll get better behavior and performance from a STREAM socket than anything else. I will try to clear up the mystery of what a socket is, as well as some hints on how to work with blocking and non-blocking sockets. But I’ll start by talking about blocking sockets. You’ll need to know how they work before dealing with non-blocking sockets. Part of the trouble with understanding these things is that “socket” can mean a number of subtly different things, depending on context. So first, let’s make a distinction between a “client” socket - an endpoint of a conversation, and a “server” socket, which is more like a switchboard operator. The client application (your browser, for example) uses “client” sockets exclusively; the web server it’s talking to uses both “server” sockets and “client” sockets. History ------- Of the various forms of IPC (Inter Process Communication), sockets are by far the most popular. On any given platform, there are likely to be other forms of IPC that are faster, but for cross-platform communication, sockets are about the only game in town. They were invented in Berkeley as part of the BSD flavor of Unix. They spread like wildfire with the internet. With good reason — the combination of sockets with INET makes talking to arbitrary machines around the world unbelievably easy (at least compared to other schemes). Creating a Socket ================= Roughly speaking, when you clicked on the link that brought you to this page, your browser did something like the following: # create an INET, STREAMing socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # now connect to the web server on port 80 - the normal http port s.connect(("www.python.org", 80)) When the "connect" completes, the socket "s" can be used to send in a request for the text of the page. The same socket will read the reply, and then be destroyed. That’s right, destroyed. Client sockets are normally only used for one exchange (or a small set of sequential exchanges). What happens in the web server is a bit more complex. First, the web server creates a “server socket”: # create an INET, STREAMing socket serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # bind the socket to a public host, and a well-known port serversocket.bind((socket.gethostname(), 80)) # become a server socket serversocket.listen(5) A couple things to notice: we used "socket.gethostname()" so that the socket would be visible to the outside world. If we had used "s.bind(('localhost', 80))" or "s.bind(('127.0.0.1', 80))" we would still have a “server” socket, but one that was only visible within the same machine. "s.bind(('', 80))" specifies that the socket is reachable by any address the machine happens to have. A second thing to note: low number ports are usually reserved for “well known” services (HTTP, SNMP etc). If you’re playing around, use a nice high number (4 digits). Finally, the argument to "listen" tells the socket library that we want it to queue up as many as 5 connect requests (the normal max) before refusing outside connections. If the rest of the code is written properly, that should be plenty. Now that we have a “server” socket, listening on port 80, we can enter the mainloop of the web server: while True: # accept connections from outside (clientsocket, address) = serversocket.accept() # now do something with the clientsocket # in this case, we'll pretend this is a threaded server ct = client_thread(clientsocket) ct.run() There’s actually 3 general ways in which this loop could work - dispatching a thread to handle "clientsocket", create a new process to handle "clientsocket", or restructure this app to use non-blocking sockets, and multiplex between our “server” socket and any active "clientsocket"s using "select". More about that later. The important thing to understand now is this: this is *all* a “server” socket does. It doesn’t send any data. It doesn’t receive any data. It just produces “client” sockets. Each "clientsocket" is created in response to some *other* “client” socket doing a "connect()" to the host and port we’re bound to. As soon as we’ve created that "clientsocket", we go back to listening for more connections. The two “clients” are free to chat it up - they are using some dynamically allocated port which will be recycled when the conversation ends. IPC --- If you need fast IPC between two processes on one machine, you should look into pipes or shared memory. If you do decide to use AF_INET sockets, bind the “server” socket to "'localhost'". On most platforms, this will take a shortcut around a couple of layers of network code and be quite a bit faster. See also: The "multiprocessing" integrates cross-platform IPC into a higher- level API. Using a Socket ============== The first thing to note, is that the web browser’s “client” socket and the web server’s “client” socket are identical beasts. That is, this is a “peer to peer” conversation. Or to put it another way, *as the designer, you will have to decide what the rules of etiquette are for a conversation*. Normally, the "connect"ing socket starts the conversation, by sending in a request, or perhaps a signon. But that’s a design decision - it’s not a rule of sockets. Now there are two sets of verbs to use for communication. You can use "send" and "recv", or you can transform your client socket into a file-like beast and use "read" and "write". The latter is the way Java presents its sockets. I’m not going to talk about it here, except to warn you that you need to use "flush" on sockets. These are buffered “files”, and a common mistake is to "write" something, and then "read" for a reply. Without a "flush" in there, you may wait forever for the reply, because the request may still be in your output buffer. Now we come to the major stumbling block of sockets - "send" and "recv" operate on the network buffers. They do not necessarily handle all the bytes you hand them (or expect from them), because their major focus is handling the network buffers. In general, they return when the associated network buffers have been filled ("send") or emptied ("recv"). They then tell you how many bytes they handled. It is *your* responsibility to call them again until your message has been completely dealt with. When a "recv" returns 0 bytes, it means the other side has closed (or is in the process of closing) the connection. You will not receive any more data on this connection. Ever. You may be able to send data successfully; I’ll talk more about this later. A protocol like HTTP uses a socket for only one transfer. The client sends a request, then reads a reply. That’s it. The socket is discarded. This means that a client can detect the end of the reply by receiving 0 bytes. But if you plan to reuse your socket for further transfers, you need to realize that *there is no* EOT (End of Transfer) *on a socket.* I repeat: if a socket "send" or "recv" returns after handling 0 bytes, the connection has been broken. If the connection has *not* been broken, you may wait on a "recv" forever, because the socket will *not* tell you that there’s nothing more to read (for now). Now if you think about that a bit, you’ll come to realize a fundamental truth of sockets: *messages must either be fixed length* (yuck), *or be delimited* (shrug), *or indicate how long they are* (much better), *or end by shutting down the connection*. The choice is entirely yours, (but some ways are righter than others). Assuming you don’t want to end the connection, the simplest solution is a fixed length message: class MySocket: """demonstration class only - coded for clarity, not efficiency """ def __init__(self, sock=None): if sock is None: self.sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM) else: self.sock = sock def connect(self, host, port): self.sock.connect((host, port)) def mysend(self, msg): totalsent = 0 while totalsent < MSGLEN: sent = self.sock.send(msg[totalsent:]) if sent == 0: raise RuntimeError("socket connection broken") totalsent = totalsent + sent def myreceive(self): chunks = [] bytes_recd = 0 while bytes_recd < MSGLEN: chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048)) if chunk == b'': raise RuntimeError("socket connection broken") chunks.append(chunk) bytes_recd = bytes_recd + len(chunk) return b''.join(chunks) The sending code here is usable for almost any messaging scheme - in Python you send strings, and you can use "len()" to determine its length (even if it has embedded "\0" characters). It’s mostly the receiving code that gets more complex. (And in C, it’s not much worse, except you can’t use "strlen" if the message has embedded "\0"s.) The easiest enhancement is to make the first character of the message an indicator of message type, and have the type determine the length. Now you have two "recv"s - the first to get (at least) that first character so you can look up the length, and the second in a loop to get the rest. If you decide to go the delimited route, you’ll be receiving in some arbitrary chunk size, (4096 or 8192 is frequently a good match for network buffer sizes), and scanning what you’ve received for a delimiter. One complication to be aware of: if your conversational protocol allows multiple messages to be sent back to back (without some kind of reply), and you pass "recv" an arbitrary chunk size, you may end up reading the start of a following message. You’ll need to put that aside and hold onto it, until it’s needed. Prefixing the message with its length (say, as 5 numeric characters) gets more complex, because (believe it or not), you may not get all 5 characters in one "recv". In playing around, you’ll get away with it; but in high network loads, your code will very quickly break unless you use two "recv" loops - the first to determine the length, the second to get the data part of the message. Nasty. This is also when you’ll discover that "send" does not always manage to get rid of everything in one pass. And despite having read this, you will eventually get bit by it! In the interests of space, building your character, (and preserving my competitive position), these enhancements are left as an exercise for the reader. Lets move on to cleaning up. Binary Data ----------- It is perfectly possible to send binary data over a socket. The major problem is that not all machines use the same formats for binary data. For example, network byte order is big-endian, with the most significant byte first, so a 16 bit integer with the value "1" would be the two hex bytes "00 01". However, most common processors (x86/AMD64, ARM, RISC-V), are little-endian, with the least significant byte first - that same "1" would be "01 00". Socket libraries have calls for converting 16 and 32 bit integers - "ntohl, htonl, ntohs, htons" where “n” means *network* and “h” means *host*, “s” means *short* and “l” means *long*. Where network order is host order, these do nothing, but where the machine is byte-reversed, these swap the bytes around appropriately. In these days of 64-bit machines, the ASCII representation of binary data is frequently smaller than the binary representation. That’s because a surprising amount of the time, most integers have the value 0, or maybe 1. The string ""0"" would be two bytes, while a full 64-bit integer would be 8. Of course, this doesn’t fit well with fixed-length messages. Decisions, decisions. Disconnecting ============= Strictly speaking, you’re supposed to use "shutdown" on a socket before you "close" it. The "shutdown" is an advisory to the socket at the other end. Depending on the argument you pass it, it can mean “I’m not going to send anymore, but I’ll still listen”, or “I’m not listening, good riddance!”. Most socket libraries, however, are so used to programmers neglecting to use this piece of etiquette that normally a "close" is the same as "shutdown(); close()". So in most situations, an explicit "shutdown" is not needed. One way to use "shutdown" effectively is in an HTTP-like exchange. The client sends a request and then does a "shutdown(1)". This tells the server “This client is done sending, but can still receive.” The server can detect “EOF” by a receive of 0 bytes. It can assume it has the complete request. The server sends a reply. If the "send" completes successfully then, indeed, the client was still receiving. Python takes the automatic shutdown a step further, and says that when a socket is garbage collected, it will automatically do a "close" if it’s needed. But relying on this is a very bad habit. If your socket just disappears without doing a "close", the socket at the other end may hang indefinitely, thinking you’re just being slow. *Please* "close" your sockets when you’re done. When Sockets Die ---------------- Probably the worst thing about using blocking sockets is what happens when the other side comes down hard (without doing a "close"). Your socket is likely to hang. TCP is a reliable protocol, and it will wait a long, long time before giving up on a connection. If you’re using threads, the entire thread is essentially dead. There’s not much you can do about it. As long as you aren’t doing something dumb, like holding a lock while doing a blocking read, the thread isn’t really consuming much in the way of resources. Do *not* try to kill the thread - part of the reason that threads are more efficient than processes is that they avoid the overhead associated with the automatic recycling of resources. In other words, if you do manage to kill the thread, your whole process is likely to be screwed up. Non-blocking Sockets ==================== If you’ve understood the preceding, you already know most of what you need to know about the mechanics of using sockets. You’ll still use the same calls, in much the same ways. It’s just that, if you do it right, your app will be almost inside-out. In Python, you use "socket.setblocking(False)" to make it non- blocking. In C, it’s more complex, (for one thing, you’ll need to choose between the BSD flavor "O_NONBLOCK" and the almost indistinguishable POSIX flavor "O_NDELAY", which is completely different from "TCP_NODELAY"), but it’s the exact same idea. You do this after creating the socket, but before using it. (Actually, if you’re nuts, you can switch back and forth.) The major mechanical difference is that "send", "recv", "connect" and "accept" can return without having done anything. You have (of course) a number of choices. You can check return code and error codes and generally drive yourself crazy. If you don’t believe me, try it sometime. Your app will grow large, buggy and suck CPU. So let’s skip the brain-dead solutions and do it right. Use "select". In C, coding "select" is fairly complex. In Python, it’s a piece of cake, but it’s close enough to the C version that if you understand "select" in Python, you’ll have little trouble with it in C: ready_to_read, ready_to_write, in_error = \ select.select( potential_readers, potential_writers, potential_errs, timeout) You pass "select" three lists: the first contains all sockets that you might want to try reading; the second all the sockets you might want to try writing to, and the last (normally left empty) those that you want to check for errors. You should note that a socket can go into more than one list. The "select" call is blocking, but you can give it a timeout. This is generally a sensible thing to do - give it a nice long timeout (say a minute) unless you have good reason to do otherwise. In return, you will get three lists. They contain the sockets that are actually readable, writable and in error. Each of these lists is a subset (possibly empty) of the corresponding list you passed in. If a socket is in the output readable list, you can be as-close-to- certain-as-we-ever-get-in-this-business that a "recv" on that socket will return *something*. Same idea for the writable list. You’ll be able to send *something*. Maybe not all you want to, but *something* is better than nothing. (Actually, any reasonably healthy socket will return as writable - it just means outbound network buffer space is available.) If you have a “server” socket, put it in the potential_readers list. If it comes out in the readable list, your "accept" will (almost certainly) work. If you have created a new socket to "connect" to someone else, put it in the potential_writers list. If it shows up in the writable list, you have a decent chance that it has connected. Actually, "select" can be handy even with blocking sockets. It’s one way of determining whether you will block - the socket returns as readable when there’s something in the buffers. However, this still doesn’t help with the problem of determining whether the other end is done, or just busy with something else. **Portability alert**: On Unix, "select" works both with the sockets and files. Don’t try this on Windows. On Windows, "select" works with sockets only. Also note that in C, many of the more advanced socket options are done differently on Windows. In fact, on Windows I usually use threads (which work very, very well) with my sockets. Sorting Techniques ****************** Author: Andrew Dalke and Raymond Hettinger Python lists have a built-in "list.sort()" method that modifies the list in-place. There is also a "sorted()" built-in function that builds a new sorted list from an iterable. In this document, we explore the various techniques for sorting data using Python. Sorting Basics ============== A simple ascending sort is very easy: just call the "sorted()" function. It returns a new sorted list: >>> sorted([5, 2, 3, 1, 4]) [1, 2, 3, 4, 5] You can also use the "list.sort()" method. It modifies the list in- place (and returns "None" to avoid confusion). Usually it’s less convenient than "sorted()" - but if you don’t need the original list, it’s slightly more efficient. >>> a = [5, 2, 3, 1, 4] >>> a.sort() >>> a [1, 2, 3, 4, 5] Another difference is that the "list.sort()" method is only defined for lists. In contrast, the "sorted()" function accepts any iterable. >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) [1, 2, 3, 4, 5] Key Functions ============= Both "list.sort()" and "sorted()" have a *key* parameter to specify a function (or other callable) to be called on each list element prior to making comparisons. For example, here’s a case-insensitive string comparison: >>> sorted("This is a test string from Andrew".split(), key=str.casefold) ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] The value of the *key* parameter should be a function (or other callable) that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record. A common pattern is to sort complex objects using some of the object’s indices as keys. For example: >>> student_tuples = [ ... ('john', 'A', 15), ... ('jane', 'B', 12), ... ('dave', 'B', 10), ... ] >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] The same technique works for objects with named attributes. For example: >>> class Student: ... def __init__(self, name, grade, age): ... self.name = name ... self.grade = grade ... self.age = age ... def __repr__(self): ... return repr((self.name, self.grade, self.age)) >>> student_objects = [ ... Student('john', 'A', 15), ... Student('jane', 'B', 12), ... Student('dave', 'B', 10), ... ] >>> sorted(student_objects, key=lambda student: student.age) # sort by age [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] Objects with named attributes can be made by a regular class as shown above, or they can be instances of "dataclass" or a *named tuple*. Operator Module Functions and Partial Function Evaluation ========================================================= The *key function* patterns shown above are very common, so Python provides convenience functions to make accessor functions easier and faster. The "operator" module has "itemgetter()", "attrgetter()", and a "methodcaller()" function. Using those functions, the above examples become simpler and faster: >>> from operator import itemgetter, attrgetter >>> sorted(student_tuples, key=itemgetter(2)) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] >>> sorted(student_objects, key=attrgetter('age')) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] The operator module functions allow multiple levels of sorting. For example, to sort by *grade* then by *age*: >>> sorted(student_tuples, key=itemgetter(1,2)) [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] >>> sorted(student_objects, key=attrgetter('grade', 'age')) [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] The "functools" module provides another helpful tool for making key- functions. The "partial()" function can reduce the arity of a multi- argument function making it suitable for use as a key-function. >>> from functools import partial >>> from unicodedata import normalize >>> names = 'Zoë Åbjørn Núñez Élana Zeke Abe Nubia Eloise'.split() >>> sorted(names, key=partial(normalize, 'NFD')) ['Abe', 'Åbjørn', 'Eloise', 'Élana', 'Nubia', 'Núñez', 'Zeke', 'Zoë'] >>> sorted(names, key=partial(normalize, 'NFC')) ['Abe', 'Eloise', 'Nubia', 'Núñez', 'Zeke', 'Zoë', 'Åbjørn', 'Élana'] Ascending and Descending ======================== Both "list.sort()" and "sorted()" accept a *reverse* parameter with a boolean value. This is used to flag descending sorts. For example, to get the student data in reverse *age* order: >>> sorted(student_tuples, key=itemgetter(2), reverse=True) [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] >>> sorted(student_objects, key=attrgetter('age'), reverse=True) [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] Sort Stability and Complex Sorts ================================ Sorts are guaranteed to be stable. That means that when multiple records have the same key, their original order is preserved. >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] >>> sorted(data, key=itemgetter(0)) [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] Notice how the two records for *blue* retain their original order so that "('blue', 1)" is guaranteed to precede "('blue', 2)". This wonderful property lets you build complex sorts in a series of sorting steps. For example, to sort the student data by descending *grade* and then ascending *age*, do the *age* sort first and then sort again using *grade*: >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] This can be abstracted out into a wrapper function that can take a list and tuples of field and order to sort them on multiple passes. >>> def multisort(xs, specs): ... for key, reverse in reversed(specs): ... xs.sort(key=attrgetter(key), reverse=reverse) ... return xs >>> multisort(list(student_objects), (('grade', True), ('age', False))) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] The Timsort algorithm used in Python does multiple sorts efficiently because it can take advantage of any ordering already present in a dataset. Decorate-Sort-Undecorate ======================== This idiom is called Decorate-Sort-Undecorate after its three steps: * First, the initial list is decorated with new values that control the sort order. * Second, the decorated list is sorted. * Finally, the decorations are removed, creating a list that contains only the initial values in the new order. For example, to sort the student data by *grade* using the DSU approach: >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] >>> decorated.sort() >>> [student for grade, i, student in decorated] # undecorate [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] This idiom works because tuples are compared lexicographically; the first items are compared; if they are the same then the second items are compared, and so on. It is not strictly necessary in all cases to include the index *i* in the decorated list, but including it gives two benefits: * The sort is stable – if two items have the same key, their order will be preserved in the sorted list. * The original items do not have to be comparable because the ordering of the decorated tuples will be determined by at most the first two items. So for example the original list could contain complex numbers which cannot be sorted directly. Another name for this idiom is Schwartzian transform, after Randal L. Schwartz, who popularized it among Perl programmers. Now that Python sorting provides key-functions, this technique is not often needed. Comparison Functions ==================== Unlike key functions that return an absolute value for sorting, a comparison function computes the relative ordering for two inputs. For example, a balance scale compares two samples giving a relative ordering: lighter, equal, or heavier. Likewise, a comparison function such as "cmp(a, b)" will return a negative value for less-than, zero if the inputs are equal, or a positive value for greater-than. It is common to encounter comparison functions when translating algorithms from other languages. Also, some libraries provide comparison functions as part of their API. For example, "locale.strcoll()" is a comparison function. To accommodate those situations, Python provides "functools.cmp_to_key" to wrap the comparison function to make it usable as a key function: sorted(words, key=cmp_to_key(strcoll)) # locale-aware sort order Odds and Ends ============= * For locale aware sorting, use "locale.strxfrm()" for a key function or "locale.strcoll()" for a comparison function. This is necessary because “alphabetical” sort orderings can vary across cultures even if the underlying alphabet is the same. * The *reverse* parameter still maintains sort stability (so that records with equal keys retain the original order). Interestingly, that effect can be simulated without the parameter by using the builtin "reversed()" function twice: >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) >>> assert standard_way == double_reversed >>> standard_way [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] * The sort routines use "<" when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an "__lt__()" method: >>> Student.__lt__ = lambda self, other: self.age < other.age >>> sorted(student_objects) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] However, note that "<" can fall back to using "__gt__()" if "__lt__()" is not implemented (see "object.__lt__()" for details on the mechanics). To avoid surprises, **PEP 8** recommends that all six comparison methods be implemented. The "total_ordering()" decorator is provided to make that task easier. * Key functions need not depend directly on the objects being sorted. A key function can also access external resources. For instance, if the student grades are stored in a dictionary, they can be used to sort a separate list of student names: >>> students = ['dave', 'john', 'jane'] >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} >>> sorted(students, key=newgrades.__getitem__) ['jane', 'dave', 'john'] Partial Sorts ============= Some applications require only some of the data to be ordered. The standard library provides several tools that do less work than a full sort: * "min()" and "max()" return the smallest and largest values, respectively. These functions make a single pass over the input data and require almost no auxiliary memory. * "heapq.nsmallest()" and "heapq.nlargest()" return the *n* smallest and largest values, respectively. These functions make a single pass over the data keeping only *n* elements in memory at a time. For values of *n* that are small relative to the number of inputs, these functions make far fewer comparisons than a full sort. * "heapq.heappush()" and "heapq.heappop()" create and maintain a partially sorted arrangement of data that keeps the smallest element at position "0". These functions are suitable for implementing priority queues which are commonly used for task scheduling. timer file descriptor HOWTO *************************** Release: 1.13 This HOWTO discusses Python’s support for the linux timer file descriptor. Examples ======== The following example shows how to use a timer file descriptor to execute a function twice a second: # Practical scripts should use really use a non-blocking timer, # we use a blocking timer here for simplicity. import os, time # Create the timer file descriptor fd = os.timerfd_create(time.CLOCK_REALTIME) # Start the timer in 1 second, with an interval of half a second os.timerfd_settime(fd, initial=1, interval=0.5) try: # Process timer events four times. for _ in range(4): # read() will block until the timer expires _ = os.read(fd, 8) print("Timer expired") finally: # Remember to close the timer file descriptor! os.close(fd) To avoid the precision loss caused by the "float" type, timer file descriptors allow specifying initial expiration and interval in integer nanoseconds with "_ns" variants of the functions. This example shows how "epoll()" can be used with timer file descriptors to wait until the file descriptor is ready for reading: import os, time, select, socket, sys # Create an epoll object ep = select.epoll() # In this example, use loopback address to send "stop" command to the server. # # $ telnet 127.0.0.1 1234 # Trying 127.0.0.1... # Connected to 127.0.0.1. # Escape character is '^]'. # stop # Connection closed by foreign host. # sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(("127.0.0.1", 1234)) sock.setblocking(False) sock.listen(1) ep.register(sock, select.EPOLLIN) # Create timer file descriptors in non-blocking mode. num = 3 fds = [] for _ in range(num): fd = os.timerfd_create(time.CLOCK_REALTIME, flags=os.TFD_NONBLOCK) fds.append(fd) # Register the timer file descriptor for read events ep.register(fd, select.EPOLLIN) # Start the timer with os.timerfd_settime_ns() in nanoseconds. # Timer 1 fires every 0.25 seconds; timer 2 every 0.5 seconds; etc for i, fd in enumerate(fds, start=1): one_sec_in_nsec = 10**9 i = i * one_sec_in_nsec os.timerfd_settime_ns(fd, initial=i//4, interval=i//4) timeout = 3 try: conn = None is_active = True while is_active: # Wait for the timer to expire for 3 seconds. # epoll.poll() returns a list of (fd, event) pairs. # fd is a file descriptor. # sock and conn[=returned value of socket.accept()] are socket objects, not file descriptors. # So use sock.fileno() and conn.fileno() to get the file descriptors. events = ep.poll(timeout) # If more than one timer file descriptors are ready for reading at once, # epoll.poll() returns a list of (fd, event) pairs. # # In this example settings, # 1st timer fires every 0.25 seconds in 0.25 seconds. (0.25, 0.5, 0.75, 1.0, ...) # 2nd timer every 0.5 seconds in 0.5 seconds. (0.5, 1.0, 1.5, 2.0, ...) # 3rd timer every 0.75 seconds in 0.75 seconds. (0.75, 1.5, 2.25, 3.0, ...) # # In 0.25 seconds, only 1st timer fires. # In 0.5 seconds, 1st timer and 2nd timer fires at once. # In 0.75 seconds, 1st timer and 3rd timer fires at once. # In 1.5 seconds, 1st timer, 2nd timer and 3rd timer fires at once. # # If a timer file descriptor is signaled more than once since # the last os.read() call, os.read() returns the number of signaled # as host order of class bytes. print(f"Signaled events={events}") for fd, event in events: if event & select.EPOLLIN: if fd == sock.fileno(): # Check if there is a connection request. print(f"Accepting connection {fd}") conn, addr = sock.accept() conn.setblocking(False) print(f"Accepted connection {conn} from {addr}") ep.register(conn, select.EPOLLIN) elif conn and fd == conn.fileno(): # Check if there is data to read. print(f"Reading data {fd}") data = conn.recv(1024) if data: # You should catch UnicodeDecodeError exception for safety. cmd = data.decode() if cmd.startswith("stop"): print(f"Stopping server") is_active = False else: print(f"Unknown command: {cmd}") else: # No more data, close connection print(f"Closing connection {fd}") ep.unregister(conn) conn.close() conn = None elif fd in fds: print(f"Reading timer {fd}") count = int.from_bytes(os.read(fd, 8), byteorder=sys.byteorder) print(f"Timer {fds.index(fd) + 1} expired {count} times") else: print(f"Unknown file descriptor {fd}") finally: for fd in fds: ep.unregister(fd) os.close(fd) ep.close() This example shows how "select()" can be used with timer file descriptors to wait until the file descriptor is ready for reading: import os, time, select, socket, sys # In this example, use loopback address to send "stop" command to the server. # # $ telnet 127.0.0.1 1234 # Trying 127.0.0.1... # Connected to 127.0.0.1. # Escape character is '^]'. # stop # Connection closed by foreign host. # sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(("127.0.0.1", 1234)) sock.setblocking(False) sock.listen(1) # Create timer file descriptors in non-blocking mode. num = 3 fds = [os.timerfd_create(time.CLOCK_REALTIME, flags=os.TFD_NONBLOCK) for _ in range(num)] select_fds = fds + [sock] # Start the timers with os.timerfd_settime() in seconds. # Timer 1 fires every 0.25 seconds; timer 2 every 0.5 seconds; etc for i, fd in enumerate(fds, start=1): os.timerfd_settime(fd, initial=i/4, interval=i/4) timeout = 3 try: conn = None is_active = True while is_active: # Wait for the timer to expire for 3 seconds. # select.select() returns a list of file descriptors or objects. rfd, wfd, xfd = select.select(select_fds, select_fds, select_fds, timeout) for fd in rfd: if fd == sock: # Check if there is a connection request. print(f"Accepting connection {fd}") conn, addr = sock.accept() conn.setblocking(False) print(f"Accepted connection {conn} from {addr}") select_fds.append(conn) elif conn and fd == conn: # Check if there is data to read. print(f"Reading data {fd}") data = conn.recv(1024) if data: # You should catch UnicodeDecodeError exception for safety. cmd = data.decode() if cmd.startswith("stop"): print(f"Stopping server") is_active = False else: print(f"Unknown command: {cmd}") else: # No more data, close connection print(f"Closing connection {fd}") select_fds.remove(conn) conn.close() conn = None elif fd in fds: print(f"Reading timer {fd}") count = int.from_bytes(os.read(fd, 8), byteorder=sys.byteorder) print(f"Timer {fds.index(fd) + 1} expired {count} times") else: print(f"Unknown file descriptor {fd}") finally: for fd in fds: os.close(fd) sock.close() sock = None Unicode HOWTO ************* Release: 1.12 This HOWTO discusses Python’s support for the Unicode specification for representing textual data, and explains various problems that people commonly encounter when trying to work with Unicode. Introduction to Unicode ======================= Definitions ----------- Today’s programs need to be able to handle a wide variety of characters. Applications are often internationalized to display messages and output in a variety of user-selectable languages; the same program might need to output an error message in English, French, Japanese, Hebrew, or Russian. Web content can be written in any of these languages and can also include a variety of emoji symbols. Python’s string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters. Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code. The Unicode specifications are continually revised and updated to add new languages and symbols. A **character** is the smallest possible component of a text. ‘A’, ‘B’, ‘C’, etc., are all different characters. So are ‘È’ and ‘Í’. Characters vary depending on the language or context you’re talking about. For example, there’s a character for “Roman Numeral One”, ‘Ⅰ’, that’s separate from the uppercase letter ‘I’. They’ll usually look the same, but these are two different characters that have different meanings. The Unicode standard describes how characters are represented by **code points**. A code point value is an integer in the range 0 to 0x10FFFF (about 1.1 million values, the actual number assigned is less than that). In the standard and in this document, a code point is written using the notation "U+265E" to mean the character with value "0x265e" (9,822 in decimal). The Unicode standard contains a lot of tables listing characters and their corresponding code points: 0061 'a'; LATIN SMALL LETTER A 0062 'b'; LATIN SMALL LETTER B 0063 'c'; LATIN SMALL LETTER C ... 007B '{'; LEFT CURLY BRACKET ... 2167 'Ⅷ'; ROMAN NUMERAL EIGHT 2168 'Ⅸ'; ROMAN NUMERAL NINE ... 265E '♞'; BLACK CHESS KNIGHT 265F '♟'; BLACK CHESS PAWN ... 1F600 '😀'; GRINNING FACE 1F609 '😉'; WINKING FACE ... Strictly, these definitions imply that it’s meaningless to say ‘this is character "U+265E"’. "U+265E" is a code point, which represents some particular character; in this case, it represents the character ‘BLACK CHESS KNIGHT’, ‘♞’. In informal contexts, this distinction between code points and characters will sometimes be forgotten. A character is represented on a screen or on paper by a set of graphical elements that’s called a **glyph**. The glyph for an uppercase A, for example, is two diagonal strokes and a horizontal stroke, though the exact details will depend on the font being used. Most Python code doesn’t need to worry about glyphs; figuring out the correct glyph to display is generally the job of a GUI toolkit or a terminal’s font renderer. Encodings --------- To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 through "0x10FFFF" (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of **code units**, and **code units** are then mapped to 8-bit bytes. The rules for translating a Unicode string into a sequence of bytes are called a **character encoding**, or just an **encoding**. The first encoding you might think of is using 32-bit integers as the code unit, and then using the CPU’s representation of 32-bit integers. In this representation, the string “Python” might look like this: P y t h o n 0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 This representation is straightforward but using it presents a number of problems. 1. It’s not portable; different processors order the bytes differently. 2. It’s very wasteful of space. In most texts, the majority of the code points are less than 127, or less than 255, so a lot of space is occupied by "0x00" bytes. The above string takes 24 bytes compared to the 6 bytes needed for an ASCII representation. Increased RAM usage doesn’t matter too much (desktop computers have gigabytes of RAM, and strings aren’t usually that large), but expanding our usage of disk and network bandwidth by a factor of 4 is intolerable. 3. It’s not compatible with existing C functions such as "strlen()", so a new family of wide string functions would need to be used. Therefore this encoding isn’t used very much, and people instead choose other encodings that are more efficient and convenient, such as UTF-8. UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.) UTF-8 uses the following rules: 1. If the code point is < 128, it’s represented by the corresponding byte value. 2. If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes, where each byte of the sequence is between 128 and 255. UTF-8 has several convenient properties: 1. It can handle any Unicode code point. 2. A Unicode string is turned into a sequence of bytes that contains embedded zero bytes only where they represent the null character (U+0000). This means that UTF-8 strings can be processed by C functions such as "strcpy()" and sent through protocols that can’t handle zero bytes for anything other than end-of-string markers. 3. A string of ASCII text is also valid UTF-8 text. 4. UTF-8 is fairly compact; the majority of commonly used characters can be represented with one or two bytes. 5. If bytes are corrupted or lost, it’s possible to determine the start of the next UTF-8-encoded code point and resynchronize. It’s also unlikely that random 8-bit data will look like valid UTF-8. 6. UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes. This avoids the byte-ordering issues that can occur with integer and word oriented encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending on the hardware on which the string was encoded. References ---------- The Unicode Consortium site has character charts, a glossary, and PDF versions of the Unicode specification. Be prepared for some difficult reading. A chronology of the origin and development of Unicode is also available on the site. On the Computerphile Youtube channel, Tom Scott briefly discusses the history of Unicode and UTF-8 (9 minutes 36 seconds). To help understand the standard, Jukka Korpela has written an introductory guide to reading the Unicode character tables. Another good introductory article was written by Joel Spolsky. If this introduction didn’t make things clear to you, you should try reading this alternate article before continuing. Wikipedia entries are often helpful; see the entries for “character encoding” and UTF-8, for example. Python’s Unicode Support ======================== Now that you’ve learned the rudiments of Unicode, we can look at Python’s Unicode features. The String Type --------------- Since Python 3.0, the language’s "str" type contains Unicode characters, meaning any string created using ""unicode rocks!"", "'unicode rocks!'", or the triple-quoted string syntax is stored as Unicode. The default encoding for Python source code is UTF-8, so you can simply include a Unicode character in a string literal: try: with open('/tmp/input.txt', 'r') as f: ... except OSError: # 'File not found' error message. print("Fichier non trouvé") Side note: Python 3 also supports using Unicode characters in identifiers: répertoire = "/tmp/records.log" with open(répertoire, "w") as f: f.write("test\n") If you can’t enter a particular character in your editor or want to keep the source code ASCII-only for some reason, you can also use escape sequences in string literals. (Depending on your system, you may see the actual capital-delta glyph instead of a u escape.) >>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name '\u0394' >>> "\u0394" # Using a 16-bit hex value '\u0394' >>> "\U00000394" # Using a 32-bit hex value '\u0394' In addition, one can create a string using the "decode()" method of "bytes". This method takes an *encoding* argument, such as "UTF-8", and optionally an *errors* argument. The *errors* argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are "'strict'" (raise a "UnicodeDecodeError" exception), "'replace'" (use "U+FFFD", "REPLACEMENT CHARACTER"), "'ignore'" (just leave the character out of the Unicode result), or "'backslashreplace'" (inserts a "\xNN" escape sequence). The following examples show the differences: >>> b'\x80abc'.decode("utf-8", "strict") Traceback (most recent call last): ... UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte >>> b'\x80abc'.decode("utf-8", "replace") '\ufffdabc' >>> b'\x80abc'.decode("utf-8", "backslashreplace") '\\x80abc' >>> b'\x80abc'.decode("utf-8", "ignore") 'abc' Encodings are specified as strings containing the encoding’s name. Python comes with roughly 100 different encodings; see the Python Library Reference at Standard Encodings for a list. Some encodings have multiple names; for example, "'latin-1'", "'iso_8859_1'" and "'8859"’ are all synonyms for the same encoding. One-character Unicode strings can also be created with the "chr()" built-in function, which takes integers and returns a Unicode string of length 1 that contains the corresponding code point. The reverse operation is the built-in "ord()" function that takes a one-character Unicode string and returns the code point value: >>> chr(57344) '\ue000' >>> ord('\ue000') 57344 Converting to Bytes ------------------- The opposite method of "bytes.decode()" is "str.encode()", which returns a "bytes" representation of the Unicode string, encoded in the requested *encoding*. The *errors* parameter is the same as the parameter of the "decode()" method but supports a few more possible handlers. As well as "'strict'", "'ignore'", and "'replace'" (which in this case inserts a question mark instead of the unencodable character), there is also "'xmlcharrefreplace'" (inserts an XML character reference), "backslashreplace" (inserts a "\uNNNN" escape sequence) and "namereplace" (inserts a "\N{...}" escape sequence). The following example shows the different results: >>> u = chr(40960) + 'abcd' + chr(1972) >>> u.encode('utf-8') b'\xea\x80\x80abcd\xde\xb4' >>> u.encode('ascii') Traceback (most recent call last): ... UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128) >>> u.encode('ascii', 'ignore') b'abcd' >>> u.encode('ascii', 'replace') b'?abcd?' >>> u.encode('ascii', 'xmlcharrefreplace') b'ꀀabcd޴' >>> u.encode('ascii', 'backslashreplace') b'\\ua000abcd\\u07b4' >>> u.encode('ascii', 'namereplace') b'\\N{YI SYLLABLE IT}abcd\\u07b4' The low-level routines for registering and accessing the available encodings are found in the "codecs" module. Implementing new encodings also requires understanding the "codecs" module. However, the encoding and decoding functions returned by this module are usually more low-level than is comfortable, and writing new encodings is a specialized task, so the module won’t be covered in this HOWTO. Unicode Literals in Python Source Code -------------------------------------- In Python source code, specific Unicode code points can be written using the "\u" escape sequence, which is followed by four hex digits giving the code point. The "\U" escape sequence is similar, but expects eight hex digits, not four: >>> s = "a\xac\u1234\u20ac\U00008000" ... # ^^^^ two-digit hex escape ... # ^^^^^^ four-digit Unicode escape ... # ^^^^^^^^^^ eight-digit Unicode escape >>> [ord(c) for c in s] [97, 172, 4660, 8364, 32768] Using escape sequences for code points greater than 127 is fine in small doses, but becomes an annoyance if you’re using many accented characters, as you would in a program with messages in French or some other accent-using language. You can also assemble strings using the "chr()" built-in function, but this is even more tedious. Ideally, you’d want to be able to write literals in your language’s natural encoding. You could then edit Python source code with your favorite editor which would display the accented characters naturally, and have the right characters used at runtime. Python supports writing source code in UTF-8 by default, but you can use almost any encoding if you declare the encoding being used. This is done by including a special comment as either the first or second line of the source file: #!/usr/bin/env python # -*- coding: latin-1 -*- u = 'abcdé' print(ord(u[-1])) The syntax is inspired by Emacs’s notation for specifying variables local to a file. Emacs supports many different variables, but Python only supports ‘coding’. The "-*-" symbols indicate to Emacs that the comment is special; they have no significance to Python but are a convention. Python looks for "coding: name" or "coding=name" in the comment. If you don’t include such a comment, the default encoding used will be UTF-8 as already mentioned. See also **PEP 263** for more information. Unicode Properties ------------------ The Unicode specification includes a database of information about code points. For each defined code point, the information includes the character’s name, its category, the numeric value if applicable (for characters representing numeric concepts such as the Roman numerals, fractions such as one-third and four-fifths, etc.). There are also display-related properties, such as how to use the code point in bidirectional text. The following program displays some information about several characters, and prints the numeric value of one particular character: import unicodedata u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231) for i, c in enumerate(u): print(i, '%04x' % ord(c), unicodedata.category(c), end=" ") print(unicodedata.name(c)) # Get numeric value of second character print(unicodedata.numeric(u[1])) When run, this prints: 0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE 1 0bf2 No TAMIL NUMBER ONE THOUSAND 2 0f84 Mn TIBETAN MARK HALANTA 3 1770 Lo TAGBANWA LETTER SA 4 33af So SQUARE RAD OVER S SQUARED 1000.0 The category codes are abbreviations describing the nature of the character. These are grouped into categories such as “Letter”, “Number”, “Punctuation”, or “Symbol”, which in turn are broken up into subcategories. To take the codes from the above output, "'Ll'" means ‘Letter, lowercase’, "'No'" means “Number, other”, "'Mn'" is “Mark, nonspacing”, and "'So'" is “Symbol, other”. See the General Category Values section of the Unicode Character Database documentation for a list of category codes. Comparing Strings ----------------- Unicode adds some complication to comparing strings, because the same set of characters can be represented by different sequences of code points. For example, a letter like ‘ê’ can be represented as a single code point U+00EA, or as U+0065 U+0302, which is the code point for ‘e’ followed by a code point for ‘COMBINING CIRCUMFLEX ACCENT’. These will produce the same output when printed, but one is a string of length 1 and the other is of length 2. One tool for a case-insensitive comparison is the "casefold()" string method that converts a string to a case-insensitive form following an algorithm described by the Unicode Standard. This algorithm has special handling for characters such as the German letter ‘ß’ (code point U+00DF), which becomes the pair of lowercase letters ‘ss’. >>> street = 'Gürzenichstraße' >>> street.casefold() 'gürzenichstrasse' A second tool is the "unicodedata" module’s "normalize()" function that converts strings to one of several normal forms, where letters followed by a combining character are replaced with single characters. "normalize()" can be used to perform string comparisons that won’t falsely report inequality if two strings use combining characters differently: import unicodedata def compare_strs(s1, s2): def NFD(s): return unicodedata.normalize('NFD', s) return NFD(s1) == NFD(s2) single_char = 'ê' multiple_chars = '\N{LATIN SMALL LETTER E}\N{COMBINING CIRCUMFLEX ACCENT}' print('length of first string=', len(single_char)) print('length of second string=', len(multiple_chars)) print(compare_strs(single_char, multiple_chars)) When run, this outputs: $ python compare-strs.py length of first string= 1 length of second string= 2 True The first argument to the "normalize()" function is a string giving the desired normalization form, which can be one of ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. The Unicode Standard also specifies how to do caseless comparisons: import unicodedata def compare_caseless(s1, s2): def NFD(s): return unicodedata.normalize('NFD', s) return NFD(NFD(s1).casefold()) == NFD(NFD(s2).casefold()) # Example usage single_char = 'ê' multiple_chars = '\N{LATIN CAPITAL LETTER E}\N{COMBINING CIRCUMFLEX ACCENT}' print(compare_caseless(single_char, multiple_chars)) This will print "True". (Why is "NFD()" invoked twice? Because there are a few characters that make "casefold()" return a non-normalized string, so the result needs to be normalized again. See section 3.13 of the Unicode Standard for a discussion and an example.) Unicode Regular Expressions --------------------------- The regular expressions supported by the "re" module can be provided either as bytes or strings. Some of the special character sequences such as "\d" and "\w" have different meanings depending on whether the pattern is supplied as bytes or a string. For example, "\d" will match the characters "[0-9]" in bytes but in strings will match any character that’s in the "'Nd'" category. The string in this example has the number 57 written in both Thai and Arabic numerals: import re p = re.compile(r'\d+') s = "Over \u0e55\u0e57 57 flavours" m = p.search(s) print(repr(m.group())) When executed, "\d+" will match the Thai numerals and print them out. If you supply the "re.ASCII" flag to "compile()", "\d+" will match the substring “57” instead. Similarly, "\w" matches a wide variety of Unicode characters but only "[a-zA-Z0-9_]" in bytes or if "re.ASCII" is supplied, and "\s" will match either Unicode whitespace characters or "[ \t\n\r\f\v]". References ---------- Some good alternative discussions of Python’s Unicode support are: * Processing Text Files in Python 3, by Nick Coghlan. * Pragmatic Unicode, a PyCon 2012 presentation by Ned Batchelder. The "str" type is described in the Python library reference at Text Sequence Type — str. The documentation for the "unicodedata" module. The documentation for the "codecs" module. Marc-André Lemburg gave a presentation titled “Python and Unicode” (PDF slides) at EuroPython 2002. The slides are an excellent overview of the design of Python 2’s Unicode features (where the Unicode string type is called "unicode" and literals start with "u"). Reading and Writing Unicode Data ================================ Once you’ve written some code that works with Unicode data, the next problem is input/output. How do you get Unicode strings into your program, and how do you convert Unicode into a form suitable for storage or transmission? It’s possible that you may not need to do anything depending on your input sources and output destinations; you should check whether the libraries used in your application support Unicode natively. XML parsers often return Unicode data, for example. Many relational databases also support Unicode-valued columns and can return Unicode values from an SQL query. Unicode data is usually converted to a particular encoding before it gets written to disk or sent over a socket. It’s possible to do all the work yourself: open a file, read an 8-bit bytes object from it, and convert the bytes with "bytes.decode(encoding)". However, the manual approach is not recommended. One problem is the multi-byte nature of encodings; one Unicode character can be represented by several bytes. If you want to read the file in arbitrary-sized chunks (say, 1024 or 4096 bytes), you need to write error-handling code to catch the case where only part of the bytes encoding a single Unicode character are read at the end of a chunk. One solution would be to read the entire file into memory and then perform the decoding, but that prevents you from working with files that are extremely large; if you need to read a 2 GiB file, you need 2 GiB of RAM. (More, really, since for at least a moment you’d need to have both the encoded string and its Unicode version in memory.) The solution would be to use the low-level decoding interface to catch the case of partial coding sequences. The work of implementing this has already been done for you: the built-in "open()" function can return a file-like object that assumes the file’s contents are in a specified encoding and accepts Unicode parameters for methods such as "read()" and "write()". This works through "open()"'s *encoding* and *errors* parameters which are interpreted just like those in "str.encode()" and "bytes.decode()". Reading Unicode from a file is therefore simple: with open('unicode.txt', encoding='utf-8') as f: for line in f: print(repr(line)) It’s also possible to open files in update mode, allowing both reading and writing: with open('test', encoding='utf-8', mode='w+') as f: f.write('\u4500 blah blah blah\n') f.seek(0) print(repr(f.readline()[:1])) The Unicode character "U+FEFF" is used as a byte-order mark (BOM), and is often written as the first character of a file in order to assist with autodetection of the file’s byte ordering. Some encodings, such as UTF-16, expect a BOM to be present at the start of a file; when such an encoding is used, the BOM will be automatically written as the first character and will be silently dropped when the file is read. There are variants of these encodings, such as ‘utf-16-le’ and ‘utf-16-be’ for little-endian and big-endian encodings, that specify one particular byte ordering and don’t skip the BOM. In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte- order dependent. The mark simply announces that the file is encoded in UTF-8. For reading such files, use the ‘utf-8-sig’ codec to automatically skip the mark if present. Unicode filenames ----------------- Most of the operating systems in common use today support filenames that contain arbitrary Unicode characters. Usually this is implemented by converting the Unicode string into some encoding that varies depending on the system. Today Python is converging on using UTF-8: Python on MacOS has used UTF-8 for several versions, and Python 3.6 switched to using UTF-8 on Windows as well. On Unix systems, there will only be a *filesystem encoding*. if you’ve set the "LANG" or "LC_CTYPE" environment variables; if you haven’t, the default encoding is again UTF-8. The "sys.getfilesystemencoding()" function returns the encoding to use on your current system, in case you want to do the encoding manually, but there’s not much reason to bother. When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you: filename = 'filename\u4500abc' with open(filename, 'w') as f: f.write('blah\n') Functions in the "os" module such as "os.stat()" will also accept Unicode filenames. The "os.listdir()" function returns filenames, which raises an issue: should it return the Unicode version of filenames, or should it return bytes containing the encoded versions? "os.listdir()" can do both, depending on whether you provided the directory path as bytes or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing a byte path will return the filenames as bytes. For example, assuming the default *filesystem encoding* is UTF-8, running the following program: fn = 'filename\u4500abc' f = open(fn, 'w') f.close() import os print(os.listdir(b'.')) print(os.listdir('.')) will produce the following output: $ python listdir-test.py [b'filename\xe4\x94\x80abc', ...] ['filename\u4500abc', ...] The first list contains UTF-8-encoded filenames, and the second list contains the Unicode versions. Note that on most occasions, you should can just stick with using Unicode with these APIs. The bytes APIs should only be used on systems where undecodable file names can be present; that’s pretty much only Unix systems now. Tips for Writing Unicode-aware Programs --------------------------------------- This section provides some suggestions on writing software that deals with Unicode. The most important tip is: Software should only work with Unicode strings internally, decoding the input data as soon as possible and encoding the output only at the end. If you attempt to write processing functions that accept both Unicode and byte strings, you will find your program vulnerable to bugs wherever you combine the two different kinds of strings. There is no automatic encoding or decoding: if you do e.g. "str + bytes", a "TypeError" will be raised. When using data coming from a web browser or some other untrusted source, a common technique is to check for illegal characters in a string before using the string in a generated command line or storing it in a database. If you’re doing this, be careful to check the decoded string, not the encoded bytes data; some encodings may have interesting properties, such as not being bijective or not being fully ASCII-compatible. This is especially true if the input data also specifies the encoding, since the attacker can then choose a clever way to hide malicious text in the encoded bytestream. Converting Between File Encodings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The "StreamRecoder" class can transparently convert between encodings, taking a stream that returns data in encoding #1 and behaving like a stream returning data in encoding #2. For example, if you have an input file *f* that’s in Latin-1, you can wrap it with a "StreamRecoder" to return bytes encoded in UTF-8: new_f = codecs.StreamRecoder(f, # en/decoder: used by read() to encode its results and # by write() to decode its input. codecs.getencoder('utf-8'), codecs.getdecoder('utf-8'), # reader/writer: used to read and write to the stream. codecs.getreader('latin-1'), codecs.getwriter('latin-1') ) Files in an Unknown Encoding ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What can you do if you need to make a change to a file, but don’t know the file’s encoding? If you know the encoding is ASCII-compatible and only want to examine or modify the ASCII parts, you can open the file with the "surrogateescape" error handler: with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f: data = f.read() # make changes to the string 'data' with open(fname + '.new', 'w', encoding="ascii", errors="surrogateescape") as f: f.write(data) The "surrogateescape" error handler will decode any non-ASCII bytes as code points in a special range running from U+DC80 to U+DCFF. These code points will then turn back into the same bytes when the "surrogateescape" error handler is used to encode the data and write it back out. References ---------- One section of Mastering Python 3 Input/Output, a PyCon 2010 talk by David Beazley, discusses text processing and binary data handling. The PDF slides for Marc-André Lemburg’s presentation “Writing Unicode- aware Applications in Python” discuss questions of character encodings as well as how to internationalize and localize an application. These slides cover Python 2.x only. The Guts of Unicode in Python is a PyCon 2013 talk by Benjamin Peterson that discusses the internal Unicode representation in Python 3.3. Acknowledgements ================ The initial draft of this document was written by Andrew Kuchling. It has since been revised further by Alexander Belopolsky, Georg Brandl, Andrew Kuchling, and Ezio Melotti. Thanks to the following people who have noted errors or offered suggestions on this article: Éric Araujo, Nicholas Bastin, Nick Coghlan, Marius Gedminas, Kent Johnson, Ken Krugler, Marc-André Lemburg, Martin von Löwis, Terry J. Reedy, Serhiy Storchaka, Eryk Sun, Chad Whitacre, Graham Wideman. HOWTO Fetch Internet Resources Using The urllib Package ******************************************************* Author: Michael Foord Introduction ============ Related Articles ^^^^^^^^^^^^^^^^ You may also find useful the following article on fetching web resources with Python: * Basic Authentication A tutorial on *Basic Authentication*, with examples in Python. **urllib.request** is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the *urlopen* function. This is capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling common situations - like basic authentication, cookies, proxies and so on. These are provided by objects called handlers and openers. urllib.request supports fetching URLs for many “URL schemes” (identified by the string before the "":"" in URL - for example ""ftp"" is the URL scheme of ""ftp://python.org/"") using their associated network protocols (e.g. FTP, HTTP). This tutorial focuses on the most common case, HTTP. For straightforward situations *urlopen* is very easy to use. But as soon as you encounter errors or non-trivial cases when opening HTTP URLs, you will need some understanding of the HyperText Transfer Protocol. The most comprehensive and authoritative reference to HTTP is **RFC 2616**. This is a technical document and not intended to be easy to read. This HOWTO aims to illustrate using *urllib*, with enough detail about HTTP to help you through. It is not intended to replace the "urllib.request" docs, but is supplementary to them. Fetching URLs ============= The simplest way to use urllib.request is as follows: import urllib.request with urllib.request.urlopen('http://python.org/') as response: html = response.read() If you wish to retrieve a resource via URL and store it in a temporary location, you can do so via the "shutil.copyfileobj()" and "tempfile.NamedTemporaryFile()" functions: import shutil import tempfile import urllib.request with urllib.request.urlopen('http://python.org/') as response: with tempfile.NamedTemporaryFile(delete=False) as tmp_file: shutil.copyfileobj(response, tmp_file) with open(tmp_file.name) as html: pass Many uses of urllib will be that simple (note that instead of an ‘http:’ URL we could have used a URL starting with ‘ftp:’, ‘file:’, etc.). However, it’s the purpose of this tutorial to explain the more complicated cases, concentrating on HTTP. HTTP is based on requests and responses - the client makes requests and servers send responses. urllib.request mirrors this with a "Request" object which represents the HTTP request you are making. In its simplest form you create a Request object that specifies the URL you want to fetch. Calling "urlopen" with this Request object returns a response object for the URL requested. This response is a file-like object, which means you can for example call ".read()" on the response: import urllib.request req = urllib.request.Request('http://python.org/') with urllib.request.urlopen(req) as response: the_page = response.read() Note that urllib.request makes use of the same Request interface to handle all URL schemes. For example, you can make an FTP request like so: req = urllib.request.Request('ftp://example.com/') In the case of HTTP, there are two extra things that Request objects allow you to do: First, you can pass data to be sent to the server. Second, you can pass extra information (“metadata”) *about* the data or about the request itself, to the server - this information is sent as HTTP “headers”. Let’s look at each of these in turn. Data ---- Sometimes you want to send data to a URL (often the URL will refer to a CGI (Common Gateway Interface) script or other web application). With HTTP, this is often done using what’s known as a **POST** request. This is often what your browser does when you submit a HTML form that you filled in on the web. Not all POSTs have to come from forms: you can use a POST to transmit arbitrary data to your own application. In the common case of HTML forms, the data needs to be encoded in a standard way, and then passed to the Request object as the "data" argument. The encoding is done using a function from the "urllib.parse" library. import urllib.parse import urllib.request url = 'http://www.someserver.com/cgi-bin/register.cgi' values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' } data = urllib.parse.urlencode(values) data = data.encode('ascii') # data should be bytes req = urllib.request.Request(url, data) with urllib.request.urlopen(req) as response: the_page = response.read() Note that other encodings are sometimes required (e.g. for file upload from HTML forms - see HTML Specification, Form Submission for more details). If you do not pass the "data" argument, urllib uses a **GET** request. One way in which GET and POST requests differ is that POST requests often have “side-effects”: they change the state of the system in some way (for example by placing an order with the website for a hundredweight of tinned spam to be delivered to your door). Though the HTTP standard makes it clear that POSTs are intended to *always* cause side-effects, and GET requests *never* to cause side-effects, nothing prevents a GET request from having side-effects, nor a POST requests from having no side-effects. Data can also be passed in an HTTP GET request by encoding it in the URL itself. This is done as follows: >>> import urllib.request >>> import urllib.parse >>> data = {} >>> data['name'] = 'Somebody Here' >>> data['location'] = 'Northampton' >>> data['language'] = 'Python' >>> url_values = urllib.parse.urlencode(data) >>> print(url_values) # The order may differ from below. name=Somebody+Here&language=Python&location=Northampton >>> url = 'http://www.example.com/example.cgi' >>> full_url = url + '?' + url_values >>> data = urllib.request.urlopen(full_url) Notice that the full URL is created by adding a "?" to the URL, followed by the encoded values. Headers ------- We’ll discuss here one particular HTTP header, to illustrate how to add headers to your HTTP request. Some websites [1] dislike being browsed by programs, or send different versions to different browsers [2]. By default urllib identifies itself as "Python-urllib/x.y" (where "x" and "y" are the major and minor version numbers of the Python release, e.g. "Python- urllib/2.5"), which may confuse the site, or just plain not work. The way a browser identifies itself is through the "User-Agent" header [3]. When you create a Request object you can pass a dictionary of headers in. The following example makes the same request as above, but identifies itself as a version of Internet Explorer [4]. import urllib.parse import urllib.request url = 'http://www.someserver.com/cgi-bin/register.cgi' user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' values = {'name': 'Michael Foord', 'location': 'Northampton', 'language': 'Python' } headers = {'User-Agent': user_agent} data = urllib.parse.urlencode(values) data = data.encode('ascii') req = urllib.request.Request(url, data, headers) with urllib.request.urlopen(req) as response: the_page = response.read() The response also has two useful methods. See the section on info and geturl which comes after we have a look at what happens when things go wrong. Handling Exceptions =================== *urlopen* raises "URLError" when it cannot handle a response (though as usual with Python APIs, built-in exceptions such as "ValueError", "TypeError" etc. may also be raised). "HTTPError" is the subclass of "URLError" raised in the specific case of HTTP URLs. The exception classes are exported from the "urllib.error" module. URLError -------- Often, URLError is raised because there is no network connection (no route to the specified server), or the specified server doesn’t exist. In this case, the exception raised will have a ‘reason’ attribute, which is a tuple containing an error code and a text error message. e.g. >>> req = urllib.request.Request('http://www.pretend_server.org') >>> try: urllib.request.urlopen(req) ... except urllib.error.URLError as e: ... print(e.reason) ... (4, 'getaddrinfo failed') HTTPError --------- Every HTTP response from the server contains a numeric “status code”. Sometimes the status code indicates that the server is unable to fulfil the request. The default handlers will handle some of these responses for you (for example, if the response is a “redirection” that requests the client fetch the document from a different URL, urllib will handle that for you). For those it can’t handle, urlopen will raise an "HTTPError". Typical errors include ‘404’ (page not found), ‘403’ (request forbidden), and ‘401’ (authentication required). See section 10 of **RFC 2616** for a reference on all the HTTP error codes. The "HTTPError" instance raised will have an integer ‘code’ attribute, which corresponds to the error sent by the server. Error Codes ~~~~~~~~~~~ Because the default handlers handle redirects (codes in the 300 range), and codes in the 100–299 range indicate success, you will usually only see error codes in the 400–599 range. "http.server.BaseHTTPRequestHandler.responses" is a useful dictionary of response codes that shows all the response codes used by **RFC 2616**. An excerpt from the dictionary is shown below responses = { ... : ('OK', 'Request fulfilled, document follows'), ... : ('Forbidden', 'Request forbidden -- authorization will ' 'not help'), : ('Not Found', 'Nothing matches the given URI'), ... : ("I'm a Teapot", 'Server refuses to brew coffee because ' 'it is a teapot'), ... : ('Service Unavailable', 'The server cannot process the ' 'request due to a high load'), ... } When an error is raised the server responds by returning an HTTP error code *and* an error page. You can use the "HTTPError" instance as a response on the page returned. This means that as well as the code attribute, it also has read, geturl, and info, methods as returned by the "urllib.response" module: >>> req = urllib.request.Request('http://www.python.org/fish.html') >>> try: ... urllib.request.urlopen(req) ... except urllib.error.HTTPError as e: ... print(e.code) ... print(e.read()) ... 404 b'\n\n\nPage Not Found\n ... Wrapping it Up -------------- So if you want to be prepared for "HTTPError" *or* "URLError" there are two basic approaches. I prefer the second approach. Number 1 ~~~~~~~~ from urllib.request import Request, urlopen from urllib.error import URLError, HTTPError req = Request(someurl) try: response = urlopen(req) except HTTPError as e: print('The server couldn\'t fulfill the request.') print('Error code: ', e.code) except URLError as e: print('We failed to reach a server.') print('Reason: ', e.reason) else: # everything is fine Note: The "except HTTPError" *must* come first, otherwise "except URLError" will *also* catch an "HTTPError". Number 2 ~~~~~~~~ from urllib.request import Request, urlopen from urllib.error import URLError req = Request(someurl) try: response = urlopen(req) except URLError as e: if hasattr(e, 'reason'): print('We failed to reach a server.') print('Reason: ', e.reason) elif hasattr(e, 'code'): print('The server couldn\'t fulfill the request.') print('Error code: ', e.code) else: # everything is fine info and geturl =============== The response returned by urlopen (or the "HTTPError" instance) has two useful methods "info()" and "geturl()" and is defined in the module "urllib.response". * **geturl** - this returns the real URL of the page fetched. This is useful because "urlopen" (or the opener object used) may have followed a redirect. The URL of the page fetched may not be the same as the URL requested. * **info** - this returns a dictionary-like object that describes the page fetched, particularly the headers sent by the server. It is currently an "http.client.HTTPMessage" instance. Typical headers include ‘Content-length’, ‘Content-type’, and so on. See the Quick Reference to HTTP Headers for a useful listing of HTTP headers with brief explanations of their meaning and use. Openers and Handlers ==================== When you fetch a URL you use an opener (an instance of the perhaps confusingly named "urllib.request.OpenerDirector"). Normally we have been using the default opener - via "urlopen" - but you can create custom openers. Openers use handlers. All the “heavy lifting” is done by the handlers. Each handler knows how to open URLs for a particular URL scheme (http, ftp, etc.), or how to handle an aspect of URL opening, for example HTTP redirections or HTTP cookies. You will want to create openers if you want to fetch URLs with specific handlers installed, for example to get an opener that handles cookies, or to get an opener that does not handle redirections. To create an opener, instantiate an "OpenerDirector", and then call ".add_handler(some_handler_instance)" repeatedly. Alternatively, you can use "build_opener", which is a convenience function for creating opener objects with a single function call. "build_opener" adds several handlers by default, but provides a quick way to add more and/or override the default handlers. Other sorts of handlers you might want to can handle proxies, authentication, and other common but slightly specialised situations. "install_opener" can be used to make an "opener" object the (global) default opener. This means that calls to "urlopen" will use the opener you have installed. Opener objects have an "open" method, which can be called directly to fetch urls in the same way as the "urlopen" function: there’s no need to call "install_opener", except as a convenience. Basic Authentication ==================== To illustrate creating and installing a handler we will use the "HTTPBasicAuthHandler". For a more detailed discussion of this subject – including an explanation of how Basic Authentication works - see the Basic Authentication Tutorial. When authentication is required, the server sends a header (as well as the 401 error code) requesting authentication. This specifies the authentication scheme and a ‘realm’. The header looks like: "WWW- Authenticate: SCHEME realm="REALM"". e.g. WWW-Authenticate: Basic realm="cPanel Users" The client should then retry the request with the appropriate name and password for the realm included as a header in the request. This is ‘basic authentication’. In order to simplify this process we can create an instance of "HTTPBasicAuthHandler" and an opener to use this handler. The "HTTPBasicAuthHandler" uses an object called a password manager to handle the mapping of URLs and realms to passwords and usernames. If you know what the realm is (from the authentication header sent by the server), then you can use a "HTTPPasswordMgr". Frequently one doesn’t care what the realm is. In that case, it is convenient to use "HTTPPasswordMgrWithDefaultRealm". This allows you to specify a default username and password for a URL. This will be supplied in the absence of you providing an alternative combination for a specific realm. We indicate this by providing "None" as the realm argument to the "add_password" method. The top-level URL is the first URL that requires authentication. URLs “deeper” than the URL you pass to .add_password() will also match. # create a password manager password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm() # Add the username and password. # If we knew the realm, we could use it instead of None. top_level_url = "http://example.com/foo/" password_mgr.add_password(None, top_level_url, username, password) handler = urllib.request.HTTPBasicAuthHandler(password_mgr) # create "opener" (OpenerDirector instance) opener = urllib.request.build_opener(handler) # use the opener to fetch a URL opener.open(a_url) # Install the opener. # Now all calls to urllib.request.urlopen use our opener. urllib.request.install_opener(opener) Note: In the above example we only supplied our "HTTPBasicAuthHandler" to "build_opener". By default openers have the handlers for normal situations – "ProxyHandler" (if a proxy setting such as an "http_proxy" environment variable is set), "UnknownHandler", "HTTPHandler", "HTTPDefaultErrorHandler", "HTTPRedirectHandler", "FTPHandler", "FileHandler", "DataHandler", "HTTPErrorProcessor". "top_level_url" is in fact *either* a full URL (including the ‘http:’ scheme component and the hostname and optionally the port number) e.g. ""http://example.com/"" *or* an “authority” (i.e. the hostname, optionally including the port number) e.g. ""example.com"" or ""example.com:8080"" (the latter example includes a port number). The authority, if present, must NOT contain the “userinfo” component - for example ""joe:password@example.com"" is not correct. Proxies ======= **urllib** will auto-detect your proxy settings and use those. This is through the "ProxyHandler", which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful [5]. One way to do this is to setup our own "ProxyHandler", with no proxies defined. This is done using similar steps to setting up a Basic Authentication handler: >>> proxy_support = urllib.request.ProxyHandler({}) >>> opener = urllib.request.build_opener(proxy_support) >>> urllib.request.install_opener(opener) Note: Currently "urllib.request" *does not* support fetching of "https" locations through a proxy. However, this can be enabled by extending urllib.request as shown in the recipe [6]. Note: "HTTP_PROXY" will be ignored if a variable "REQUEST_METHOD" is set; see the documentation on "getproxies()". Sockets and Layers ================== The Python support for fetching resources from the web is layered. urllib uses the "http.client" library, which in turn uses the socket library. As of Python 2.3 you can specify how long a socket should wait for a response before timing out. This can be useful in applications which have to fetch web pages. By default the socket module has *no timeout* and can hang. Currently, the socket timeout is not exposed at the http.client or urllib.request levels. However, you can set the default timeout globally for all sockets using import socket import urllib.request # timeout in seconds timeout = 10 socket.setdefaulttimeout(timeout) # this call to urllib.request.urlopen now uses the default timeout # we have set in the socket module req = urllib.request.Request('http://www.voidspace.org.uk') response = urllib.request.urlopen(req) ====================================================================== Footnotes ========= This document was reviewed and revised by John Lee. [1] Google for example. [2] Browser sniffing is a very bad practice for website design - building sites using web standards is much more sensible. Unfortunately a lot of sites still send different versions to different browsers. [3] The user agent for MSIE 6 is *‘Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)’* [4] For details of more HTTP request headers, see Quick Reference to HTTP Headers. [5] In my case I have to use a proxy to access the internet at work. If you attempt to fetch *localhost* URLs through this proxy it blocks them. IE is set to use the proxy, which urllib picks up on. In order to test scripts with a localhost server, I have to prevent urllib from using the proxy. [6] urllib opener for SSL proxy (CONNECT method): ASPN Cookbook Recipe. Installing Python Modules ************************* Email: distutils-sig@python.org As a popular open source development project, Python has an active supporting community of contributors and users that also make their software available for other Python developers to use under open source license terms. This allows Python users to share and collaborate effectively, benefiting from the solutions others have already created to common (and sometimes even rare!) problems, as well as potentially contributing their own solutions to the common pool. This guide covers the installation part of the process. For a guide to creating and sharing your own Python projects, refer to the Python packaging user guide. Note: For corporate and other institutional users, be aware that many organisations have their own policies around using and contributing to open source software. Please take such policies into account when making use of the distribution and installation tools provided with Python. Key terms ========= * "pip" is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers. * A *virtual environment* is a semi-isolated Python environment that allows packages to be installed for use by a particular application, rather than being installed system wide. * "venv" is the standard tool for creating virtual environments, and has been part of Python since Python 3.3. Starting with Python 3.4, it defaults to installing "pip" into all created virtual environments. * "virtualenv" is a third party alternative (and predecessor) to "venv". It allows virtual environments to be used on versions of Python prior to 3.4, which either don’t provide "venv" at all, or aren’t able to automatically install "pip" into created environments. * The Python Package Index is a public repository of open source licensed packages made available for use by other Python users. * the Python Packaging Authority is the group of developers and documentation authors responsible for the maintenance and evolution of the standard packaging tools and the associated metadata and file format standards. They maintain a variety of tools, documentation, and issue trackers on GitHub. * "distutils" is the original build and distribution system first added to the Python standard library in 1998. While direct use of "distutils" is being phased out, it still laid the foundation for the current packaging and distribution infrastructure, and it not only remains part of the standard library, but its name lives on in other ways (such as the name of the mailing list used to coordinate Python packaging standards development). Changed in version 3.5: The use of "venv" is now recommended for creating virtual environments. See also: Python Packaging User Guide: Creating and using virtual environments Basic usage =========== The standard packaging tools are all designed to be used from the command line. The following command will install the latest version of a module and its dependencies from the Python Package Index: python -m pip install SomePackage Note: For POSIX users (including macOS and Linux users), the examples in this guide assume the use of a *virtual environment*.For Windows users, the examples in this guide assume that the option to adjust the system PATH environment variable was selected when installing Python. It’s also possible to specify an exact or minimum version directly on the command line. When using comparator operators such as ">", "<" or some other special character which get interpreted by shell, the package name and the version should be enclosed within double quotes: python -m pip install SomePackage==1.0.4 # specific version python -m pip install "SomePackage>=1.0.4" # minimum version Normally, if a suitable module is already installed, attempting to install it again will have no effect. Upgrading existing modules must be requested explicitly: python -m pip install --upgrade SomePackage More information and resources regarding "pip" and its capabilities can be found in the Python Packaging User Guide. Creation of virtual environments is done through the "venv" module. Installing packages into an active virtual environment uses the commands shown above. See also: Python Packaging User Guide: Installing Python Distribution Packages How do I …? =========== These are quick answers or links for some common tasks. … install "pip" in versions of Python prior to Python 3.4? ---------------------------------------------------------- Python only started bundling "pip" with Python 3.4. For earlier versions, "pip" needs to be “bootstrapped” as described in the Python Packaging User Guide. See also: Python Packaging User Guide: Requirements for Installing Packages … install packages just for the current user? --------------------------------------------- Passing the "--user" option to "python -m pip install" will install a package just for the current user, rather than for all users of the system. … install scientific Python packages? ------------------------------------- A number of scientific Python packages have complex binary dependencies, and aren’t currently easy to install using "pip" directly. At this point in time, it will often be easier for users to install these packages by other means rather than attempting to install them with "pip". See also: Python Packaging User Guide: Installing Scientific Packages … work with multiple versions of Python installed in parallel? -------------------------------------------------------------- On Linux, macOS, and other POSIX systems, use the versioned Python commands in combination with the "-m" switch to run the appropriate copy of "pip": python2 -m pip install SomePackage # default Python 2 python2.7 -m pip install SomePackage # specifically Python 2.7 python3 -m pip install SomePackage # default Python 3 python3.4 -m pip install SomePackage # specifically Python 3.4 Appropriately versioned "pip" commands may also be available. On Windows, use the "py" Python launcher in combination with the "-m" switch: py -2 -m pip install SomePackage # default Python 2 py -2.7 -m pip install SomePackage # specifically Python 2.7 py -3 -m pip install SomePackage # default Python 3 py -3.4 -m pip install SomePackage # specifically Python 3.4 Common installation issues ========================== Installing into the system Python on Linux ------------------------------------------ On Linux systems, a Python installation will typically be included as part of the distribution. Installing into this Python installation requires root access to the system, and may interfere with the operation of the system package manager and other components of the system if a component is unexpectedly upgraded using "pip". On such systems, it is often better to use a virtual environment or a per-user installation when installing packages with "pip". Pip not installed ----------------- It is possible that "pip" does not get installed by default. One potential fix is: python -m ensurepip --default-pip There are also additional resources for installing pip. Installing binary extensions ---------------------------- Python has typically relied heavily on source based distribution, with end users being expected to compile extension modules from source as part of the installation process. With the introduction of support for the binary "wheel" format, and the ability to publish wheels for at least Windows and macOS through the Python Package Index, this problem is expected to diminish over time, as users are more regularly able to install pre-built extensions rather than needing to build them themselves. Some of the solutions for installing scientific software that are not yet available as pre-built "wheel" files may also help with obtaining other binary extensions without needing to build them locally. See also: Python Packaging User Guide: Binary Extensions "__future__" — Future statement definitions ******************************************* **Source code:** Lib/__future__.py ====================================================================== Imports of the form "from __future__ import feature" are called future statements. These are special-cased by the Python compiler to allow the use of new Python features in modules containing the future statement before the release in which the feature becomes standard. While these future statements are given additional special meaning by the Python compiler, they are still executed like any other import statement and the "__future__" exists and is handled by the import system the same way any other Python module would be. This design serves three purposes: * To avoid confusing existing tools that analyze import statements and expect to find the modules they’re importing. * To document when incompatible changes were introduced, and when they will be — or were — made mandatory. This is a form of executable documentation, and can be inspected programmatically via importing "__future__" and examining its contents. * To ensure that future statements run under releases prior to Python 2.1 at least yield runtime exceptions (the import of "__future__" will fail, because there was no module of that name prior to 2.1). Module Contents =============== No feature description will ever be deleted from "__future__". Since its introduction in Python 2.1 the following features have found their way into the language using this mechanism: +--------------------+---------------+----------------+-----------------------------------------------+ | feature | optional in | mandatory in | effect | |====================|===============|================|===============================================| | nested_scopes | 2.1.0b1 | 2.2 | **PEP 227**: *Statically Nested Scopes* | +--------------------+---------------+----------------+-----------------------------------------------+ | generators | 2.2.0a1 | 2.3 | **PEP 255**: *Simple Generators* | +--------------------+---------------+----------------+-----------------------------------------------+ | division | 2.2.0a2 | 3.0 | **PEP 238**: *Changing the Division Operator* | +--------------------+---------------+----------------+-----------------------------------------------+ | absolute_import | 2.5.0a1 | 3.0 | **PEP 328**: *Imports: Multi-Line and | | | | | Absolute/Relative* | +--------------------+---------------+----------------+-----------------------------------------------+ | with_statement | 2.5.0a1 | 2.6 | **PEP 343**: *The “with” Statement* | +--------------------+---------------+----------------+-----------------------------------------------+ | print_function | 2.6.0a2 | 3.0 | **PEP 3105**: *Make print a function* | +--------------------+---------------+----------------+-----------------------------------------------+ | unicode_literals | 2.6.0a2 | 3.0 | **PEP 3112**: *Bytes literals in Python 3000* | +--------------------+---------------+----------------+-----------------------------------------------+ | generator_stop | 3.5.0b1 | 3.7 | **PEP 479**: *StopIteration handling inside | | | | | generators* | +--------------------+---------------+----------------+-----------------------------------------------+ | annotations | 3.7.0b1 | TBD [1] | **PEP 563**: *Postponed evaluation of | | | | | annotations* | +--------------------+---------------+----------------+-----------------------------------------------+ class __future__._Feature Each statement in "__future__.py" is of the form: FeatureName = _Feature(OptionalRelease, MandatoryRelease, CompilerFlag) where, normally, *OptionalRelease* is less than *MandatoryRelease*, and both are 5-tuples of the same form as "sys.version_info": (PY_MAJOR_VERSION, # the 2 in 2.1.0a3; an int PY_MINOR_VERSION, # the 1; an int PY_MICRO_VERSION, # the 0; an int PY_RELEASE_LEVEL, # "alpha", "beta", "candidate" or "final"; string PY_RELEASE_SERIAL # the 3; an int ) _Feature.getOptionalRelease() *OptionalRelease* records the first release in which the feature was accepted. _Feature.getMandatoryRelease() In the case of a *MandatoryRelease* that has not yet occurred, *MandatoryRelease* predicts the release in which the feature will become part of the language. Else *MandatoryRelease* records when the feature became part of the language; in releases at or after that, modules no longer need a future statement to use the feature in question, but may continue to use such imports. *MandatoryRelease* may also be "None", meaning that a planned feature got dropped or that it is not yet decided. _Feature.compiler_flag *CompilerFlag* is the (bitfield) flag that should be passed in the fourth argument to the built-in function "compile()" to enable the feature in dynamically compiled code. This flag is stored in the "_Feature.compiler_flag" attribute on "_Feature" instances. [1] "from __future__ import annotations" was previously scheduled to become mandatory in Python 3.10, but the Python Steering Council twice decided to delay the change (announcement for Python 3.10; announcement for Python 3.11). No final decision has been made yet. See also **PEP 563** and **PEP 649**. See also: Future statements How the compiler treats future imports. **PEP 236** - Back to the __future__ The original proposal for the __future__ mechanism. "__main__" — Top-level code environment *************************************** ====================================================================== In Python, the special name "__main__" is used for two important constructs: 1. the name of the top-level environment of the program, which can be checked using the "__name__ == '__main__'" expression; and 2. the "__main__.py" file in Python packages. Both of these mechanisms are related to Python modules; how users interact with them and how they interact with each other. They are explained in detail below. If you’re new to Python modules, see the tutorial section Modules for an introduction. "__name__ == '__main__'" ======================== When a Python module or package is imported, "__name__" is set to the module’s name. Usually, this is the name of the Python file itself without the ".py" extension: >>> import configparser >>> configparser.__name__ 'configparser' If the file is part of a package, "__name__" will also include the parent package’s path: >>> from concurrent.futures import process >>> process.__name__ 'concurrent.futures.process' However, if the module is executed in the top-level code environment, its "__name__" is set to the string "'__main__'". What is the “top-level code environment”? ----------------------------------------- "__main__" is the name of the environment where top-level code is run. “Top-level code” is the first user-specified Python module that starts running. It’s “top-level” because it imports all other modules that the program needs. Sometimes “top-level code” is called an *entry point* to the application. The top-level code environment can be: * the scope of an interactive prompt: >>> __name__ '__main__' * the Python module passed to the Python interpreter as a file argument: $ python helloworld.py Hello, world! * the Python module or package passed to the Python interpreter with the "-m" argument: $ python -m tarfile usage: tarfile.py [-h] [-v] (...) * Python code read by the Python interpreter from standard input: $ echo "import this" | python The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. ... * Python code passed to the Python interpreter with the "-c" argument: $ python -c "import this" The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. ... In each of these situations, the top-level module’s "__name__" is set to "'__main__'". As a result, a module can discover whether or not it is running in the top-level environment by checking its own "__name__", which allows a common idiom for conditionally executing code when the module is not initialized from an import statement: if __name__ == '__main__': # Execute when the module is not initialized from an import statement. ... See also: For a more detailed look at how "__name__" is set in all situations, see the tutorial section Modules. Idiomatic Usage --------------- Some modules contain code that is intended for script use only, like parsing command-line arguments or fetching data from standard input. If a module like this was imported from a different module, for example to unit test it, the script code would unintentionally execute as well. This is where using the "if __name__ == '__main__'" code block comes in handy. Code within this block won’t run unless the module is executed in the top-level environment. Putting as few statements as possible in the block below "if __name__ == '__main__'" can improve code clarity and correctness. Most often, a function named "main" encapsulates the program’s primary behavior: # echo.py import shlex import sys def echo(phrase: str) -> None: """A dummy wrapper around print.""" # for demonstration purposes, you can imagine that there is some # valuable and reusable logic inside this function print(phrase) def main() -> int: """Echo the input arguments to standard output""" phrase = shlex.join(sys.argv) echo(phrase) return 0 if __name__ == '__main__': sys.exit(main()) # next section explains the use of sys.exit Note that if the module didn’t encapsulate code inside the "main" function but instead put it directly within the "if __name__ == '__main__'" block, the "phrase" variable would be global to the entire module. This is error-prone as other functions within the module could be unintentionally using the global variable instead of a local name. A "main" function solves this problem. Using a "main" function has the added benefit of the "echo" function itself being isolated and importable elsewhere. When "echo.py" is imported, the "echo" and "main" functions will be defined, but neither of them will be called, because "__name__ != '__main__'". Packaging Considerations ------------------------ "main" functions are often used to create command-line tools by specifying them as entry points for console scripts. When this is done, pip inserts the function call into a template script, where the return value of "main" is passed into "sys.exit()". For example: sys.exit(main()) Since the call to "main" is wrapped in "sys.exit()", the expectation is that your function will return some value acceptable as an input to "sys.exit()"; typically, an integer or "None" (which is implicitly returned if your function does not have a return statement). By proactively following this convention ourselves, our module will have the same behavior when run directly (i.e. "python echo.py") as it will have if we later package it as a console script entry-point in a pip-installable package. In particular, be careful about returning strings from your "main" function. "sys.exit()" will interpret a string argument as a failure message, so your program will have an exit code of "1", indicating failure, and the string will be written to "sys.stderr". The "echo.py" example from earlier exemplifies using the "sys.exit(main())" convention. See also: Python Packaging User Guide contains a collection of tutorials and references on how to distribute and install Python packages with modern tools. "__main__.py" in Python Packages ================================ If you are not familiar with Python packages, see section Packages of the tutorial. Most commonly, the "__main__.py" file is used to provide a command-line interface for a package. Consider the following hypothetical package, “bandclass”: bandclass ├── __init__.py ├── __main__.py └── student.py "__main__.py" will be executed when the package itself is invoked directly from the command line using the "-m" flag. For example: $ python -m bandclass This command will cause "__main__.py" to run. How you utilize this mechanism will depend on the nature of the package you are writing, but in this hypothetical case, it might make sense to allow the teacher to search for students: # bandclass/__main__.py import sys from .student import search_students student_name = sys.argv[1] if len(sys.argv) >= 2 else '' print(f'Found student: {search_students(student_name)}') Note that "from .student import search_students" is an example of a relative import. This import style can be used when referencing modules within a package. For more details, see Intra-package References in the Modules section of the tutorial. Idiomatic Usage --------------- The content of "__main__.py" typically isn’t fenced with an "if __name__ == '__main__'" block. Instead, those files are kept short and import functions to execute from other modules. Those other modules can then be easily unit-tested and are properly reusable. If used, an "if __name__ == '__main__'" block will still work as expected for a "__main__.py" file within a package, because its "__name__" attribute will include the package’s path if imported: >>> import asyncio.__main__ >>> asyncio.__main__.__name__ 'asyncio.__main__' This won’t work for "__main__.py" files in the root directory of a ".zip" file though. Hence, for consistency, a minimal "__main__.py" without a "__name__" check is preferred. See also: See "venv" for an example of a package with a minimal "__main__.py" in the standard library. It doesn’t contain a "if __name__ == '__main__'" block. You can invoke it with "python -m venv [directory]". See "runpy" for more details on the "-m" flag to the interpreter executable. See "zipapp" for how to run applications packaged as *.zip* files. In this case Python looks for a "__main__.py" file in the root directory of the archive. "import __main__" ================= Regardless of which module a Python program was started with, other modules running within that same program can import the top-level environment’s scope (*namespace*) by importing the "__main__" module. This doesn’t import a "__main__.py" file but rather whichever module that received the special name "'__main__'". Here is an example module that consumes the "__main__" namespace: # namely.py import __main__ def did_user_define_their_name(): return 'my_name' in dir(__main__) def print_user_name(): if not did_user_define_their_name(): raise ValueError('Define the variable `my_name`!') if '__file__' in dir(__main__): print(__main__.my_name, "found in file", __main__.__file__) else: print(__main__.my_name) Example usage of this module could be as follows: # start.py import sys from namely import print_user_name # my_name = "Dinsdale" def main(): try: print_user_name() except ValueError as ve: return str(ve) if __name__ == "__main__": sys.exit(main()) Now, if we started our program, the result would look like this: $ python start.py Define the variable `my_name`! The exit code of the program would be 1, indicating an error. Uncommenting the line with "my_name = "Dinsdale"" fixes the program and now it exits with status code 0, indicating success: $ python start.py Dinsdale found in file /path/to/start.py Note that importing "__main__" doesn’t cause any issues with unintentionally running top-level code meant for script use which is put in the "if __name__ == "__main__"" block of the "start" module. Why does this work? Python inserts an empty "__main__" module in "sys.modules" at interpreter startup, and populates it by running top-level code. In our example this is the "start" module which runs line by line and imports "namely". In turn, "namely" imports "__main__" (which is really "start"). That’s an import cycle! Fortunately, since the partially populated "__main__" module is present in "sys.modules", Python passes that to "namely". See Special considerations for __main__ in the import system’s reference for details on how this works. The Python REPL is another example of a “top-level environment”, so anything defined in the REPL becomes part of the "__main__" scope: >>> import namely >>> namely.did_user_define_their_name() False >>> namely.print_user_name() Traceback (most recent call last): ... ValueError: Define the variable `my_name`! >>> my_name = 'Jabberwocky' >>> namely.did_user_define_their_name() True >>> namely.print_user_name() Jabberwocky Note that in this case the "__main__" scope doesn’t contain a "__file__" attribute as it’s interactive. The "__main__" scope is used in the implementation of "pdb" and "rlcompleter". "_thread" — Low-level threading API *********************************** ====================================================================== This module provides low-level primitives for working with multiple threads (also called *light-weight processes* or *tasks*) — multiple threads of control sharing their global data space. For synchronization, simple locks (also called *mutexes* or *binary semaphores*) are provided. The "threading" module provides an easier to use and higher-level threading API built on top of this module. Changed in version 3.7: This module used to be optional, it is now always available. This module defines the following constants and functions: exception _thread.error Raised on thread-specific errors. Changed in version 3.3: This is now a synonym of the built-in "RuntimeError". _thread.LockType This is the type of lock objects. _thread.start_new_thread(function, args[, kwargs]) Start a new thread and return its identifier. The thread executes the function *function* with the argument list *args* (which must be a tuple). The optional *kwargs* argument specifies a dictionary of keyword arguments. When the function returns, the thread silently exits. When the function terminates with an unhandled exception, "sys.unraisablehook()" is called to handle the exception. The *object* attribute of the hook argument is *function*. By default, a stack trace is printed and then the thread exits (but other threads continue to run). When the function raises a "SystemExit" exception, it is silently ignored. Raises an auditing event "_thread.start_new_thread" with arguments "function", "args", "kwargs". Changed in version 3.8: "sys.unraisablehook()" is now used to handle unhandled exceptions. _thread.interrupt_main(signum=signal.SIGINT, /) Simulate the effect of a signal arriving in the main thread. A thread can use this function to interrupt the main thread, though there is no guarantee that the interruption will happen immediately. If given, *signum* is the number of the signal to simulate. If *signum* is not given, "signal.SIGINT" is simulated. If the given signal isn’t handled by Python (it was set to "signal.SIG_DFL" or "signal.SIG_IGN"), this function does nothing. Changed in version 3.10: The *signum* argument is added to customize the signal number. Note: This does not emit the corresponding signal but schedules a call to the associated handler (if it exists). If you want to truly emit the signal, use "signal.raise_signal()". _thread.exit() Raise the "SystemExit" exception. When not caught, this will cause the thread to exit silently. _thread.allocate_lock() Return a new lock object. Methods of locks are described below. The lock is initially unlocked. _thread.get_ident() Return the ‘thread identifier’ of the current thread. This is a nonzero integer. Its value has no direct meaning; it is intended as a magic cookie to be used e.g. to index a dictionary of thread- specific data. Thread identifiers may be recycled when a thread exits and another thread is created. _thread.get_native_id() Return the native integral Thread ID of the current thread assigned by the kernel. This is a non-negative integer. Its value may be used to uniquely identify this particular thread system-wide (until the thread terminates, after which the value may be recycled by the OS). Availability: Windows, FreeBSD, Linux, macOS, OpenBSD, NetBSD, AIX, DragonFlyBSD, GNU/kFreeBSD. Added in version 3.8. Changed in version 3.13: Added support for GNU/kFreeBSD. _thread.stack_size([size]) Return the thread stack size used when creating new threads. The optional *size* argument specifies the stack size to be used for subsequently created threads, and must be 0 (use platform or configured default) or a positive integer value of at least 32,768 (32 KiB). If *size* is not specified, 0 is used. If changing the thread stack size is unsupported, a "RuntimeError" is raised. If the specified stack size is invalid, a "ValueError" is raised and the stack size is unmodified. 32 KiB is currently the minimum supported stack size value to guarantee sufficient stack space for the interpreter itself. Note that some platforms may have particular restrictions on values for the stack size, such as requiring a minimum stack size > 32 KiB or requiring allocation in multiples of the system memory page size - platform documentation should be referred to for more information (4 KiB pages are common; using multiples of 4096 for the stack size is the suggested approach in the absence of more specific information). Availability: Windows, pthreads. Unix platforms with POSIX threads support. _thread.TIMEOUT_MAX The maximum value allowed for the *timeout* parameter of "Lock.acquire". Specifying a timeout greater than this value will raise an "OverflowError". Added in version 3.2. Lock objects have the following methods: lock.acquire(blocking=True, timeout=-1) Without any optional argument, this method acquires the lock unconditionally, if necessary waiting until it is released by another thread (only one thread at a time can acquire a lock — that’s their reason for existence). If the *blocking* argument is present, the action depends on its value: if it is false, the lock is only acquired if it can be acquired immediately without waiting, while if it is true, the lock is acquired unconditionally as above. If the floating-point *timeout* argument is present and positive, it specifies the maximum wait time in seconds before returning. A negative *timeout* argument specifies an unbounded wait. You cannot specify a *timeout* if *blocking* is false. The return value is "True" if the lock is acquired successfully, "False" if not. Changed in version 3.2: The *timeout* parameter is new. Changed in version 3.2: Lock acquires can now be interrupted by signals on POSIX. lock.release() Releases the lock. The lock must have been acquired earlier, but not necessarily by the same thread. lock.locked() Return the status of the lock: "True" if it has been acquired by some thread, "False" if not. In addition to these methods, lock objects can also be used via the "with" statement, e.g.: import _thread a_lock = _thread.allocate_lock() with a_lock: print("a_lock is locked while this executes") **Caveats:** * Interrupts always go to the main thread (the "KeyboardInterrupt" exception will be received by that thread.) * Calling "sys.exit()" or raising the "SystemExit" exception is equivalent to calling "_thread.exit()". * It is platform-dependent whether the "acquire()" method on a lock can be interrupted (so that the "KeyboardInterrupt" exception will happen immediately, rather than only after the lock has been acquired or the operation has timed out). It can be interrupted on POSIX, but not on Windows. * When the main thread exits, it is system defined whether the other threads survive. On most systems, they are killed without executing "try" … "finally" clauses or executing object destructors. "abc" — Abstract Base Classes ***************************** **Source code:** Lib/abc.py ====================================================================== This module provides the infrastructure for defining *abstract base classes* (ABCs) in Python, as outlined in **PEP 3119**; see the PEP for why this was added to Python. (See also **PEP 3141** and the "numbers" module regarding a type hierarchy for numbers based on ABCs.) The "collections" module has some concrete classes that derive from ABCs; these can, of course, be further derived. In addition, the "collections.abc" submodule has some ABCs that can be used to test whether a class or instance provides a particular interface, for example, if it is *hashable* or if it is a *mapping*. This module provides the metaclass "ABCMeta" for defining ABCs and a helper class "ABC" to alternatively define ABCs through inheritance: class abc.ABC A helper class that has "ABCMeta" as its metaclass. With this class, an abstract base class can be created by simply deriving from "ABC" avoiding sometimes confusing metaclass usage, for example: from abc import ABC class MyABC(ABC): pass Note that the type of "ABC" is still "ABCMeta", therefore inheriting from "ABC" requires the usual precautions regarding metaclass usage, as multiple inheritance may lead to metaclass conflicts. One may also define an abstract base class by passing the metaclass keyword and using "ABCMeta" directly, for example: from abc import ABCMeta class MyABC(metaclass=ABCMeta): pass Added in version 3.4. class abc.ABCMeta Metaclass for defining Abstract Base Classes (ABCs). Use this metaclass to create an ABC. An ABC can be subclassed directly, and then acts as a mix-in class. You can also register unrelated concrete classes (even built-in classes) and unrelated ABCs as “virtual subclasses” – these and their descendants will be considered subclasses of the registering ABC by the built-in "issubclass()" function, but the registering ABC won’t show up in their MRO (Method Resolution Order) nor will method implementations defined by the registering ABC be callable (not even via "super()"). [1] Classes created with a metaclass of "ABCMeta" have the following method: register(subclass) Register *subclass* as a “virtual subclass” of this ABC. For example: from abc import ABC class MyABC(ABC): pass MyABC.register(tuple) assert issubclass(tuple, MyABC) assert isinstance((), MyABC) Changed in version 3.3: Returns the registered subclass, to allow usage as a class decorator. Changed in version 3.4: To detect calls to "register()", you can use the "get_cache_token()" function. You can also override this method in an abstract base class: __subclasshook__(subclass) (Must be defined as a class method.) Check whether *subclass* is considered a subclass of this ABC. This means that you can customize the behavior of "issubclass()" further without the need to call "register()" on every class you want to consider a subclass of the ABC. (This class method is called from the "__subclasscheck__()" method of the ABC.) This method should return "True", "False" or "NotImplemented". If it returns "True", the *subclass* is considered a subclass of this ABC. If it returns "False", the *subclass* is not considered a subclass of this ABC, even if it would normally be one. If it returns "NotImplemented", the subclass check is continued with the usual mechanism. For a demonstration of these concepts, look at this example ABC definition: class Foo: def __getitem__(self, index): ... def __len__(self): ... def get_iterator(self): return iter(self) class MyIterable(ABC): @abstractmethod def __iter__(self): while False: yield None def get_iterator(self): return self.__iter__() @classmethod def __subclasshook__(cls, C): if cls is MyIterable: if any("__iter__" in B.__dict__ for B in C.__mro__): return True return NotImplemented MyIterable.register(Foo) The ABC "MyIterable" defines the standard iterable method, "__iter__()", as an abstract method. The implementation given here can still be called from subclasses. The "get_iterator()" method is also part of the "MyIterable" abstract base class, but it does not have to be overridden in non-abstract derived classes. The "__subclasshook__()" class method defined here says that any class that has an "__iter__()" method in its "__dict__" (or in that of one of its base classes, accessed via the "__mro__" list) is considered a "MyIterable" too. Finally, the last line makes "Foo" a virtual subclass of "MyIterable", even though it does not define an "__iter__()" method (it uses the old-style iterable protocol, defined in terms of "__len__()" and "__getitem__()"). Note that this will not make "get_iterator" available as a method of "Foo", so it is provided separately. The "abc" module also provides the following decorator: @abc.abstractmethod A decorator indicating abstract methods. Using this decorator requires that the class’s metaclass is "ABCMeta" or is derived from it. A class that has a metaclass derived from "ABCMeta" cannot be instantiated unless all of its abstract methods and properties are overridden. The abstract methods can be called using any of the normal ‘super’ call mechanisms. "abstractmethod()" may be used to declare abstract methods for properties and descriptors. Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are only supported using the "update_abstractmethods()" function. The "abstractmethod()" only affects subclasses derived using regular inheritance; “virtual subclasses” registered with the ABC’s "register()" method are not affected. When "abstractmethod()" is applied in combination with other method descriptors, it should be applied as the innermost decorator, as shown in the following usage examples: class C(ABC): @abstractmethod def my_abstract_method(self, arg1): ... @classmethod @abstractmethod def my_abstract_classmethod(cls, arg2): ... @staticmethod @abstractmethod def my_abstract_staticmethod(arg3): ... @property @abstractmethod def my_abstract_property(self): ... @my_abstract_property.setter @abstractmethod def my_abstract_property(self, val): ... @abstractmethod def _get_x(self): ... @abstractmethod def _set_x(self, val): ... x = property(_get_x, _set_x) In order to correctly interoperate with the abstract base class machinery, the descriptor must identify itself as abstract using "__isabstractmethod__". In general, this attribute should be "True" if any of the methods used to compose the descriptor are abstract. For example, Python’s built-in "property" does the equivalent of: class Descriptor: ... @property def __isabstractmethod__(self): return any(getattr(f, '__isabstractmethod__', False) for f in (self._fget, self._fset, self._fdel)) Note: Unlike Java abstract methods, these abstract methods may have an implementation. This implementation can be called via the "super()" mechanism from the class that overrides it. This could be useful as an end-point for a super-call in a framework that uses cooperative multiple-inheritance. The "abc" module also supports the following legacy decorators: @abc.abstractclassmethod Added in version 3.2. Deprecated since version 3.3: It is now possible to use "classmethod" with "abstractmethod()", making this decorator redundant. A subclass of the built-in "classmethod()", indicating an abstract classmethod. Otherwise it is similar to "abstractmethod()". This special case is deprecated, as the "classmethod()" decorator is now correctly identified as abstract when applied to an abstract method: class C(ABC): @classmethod @abstractmethod def my_abstract_classmethod(cls, arg): ... @abc.abstractstaticmethod Added in version 3.2. Deprecated since version 3.3: It is now possible to use "staticmethod" with "abstractmethod()", making this decorator redundant. A subclass of the built-in "staticmethod()", indicating an abstract staticmethod. Otherwise it is similar to "abstractmethod()". This special case is deprecated, as the "staticmethod()" decorator is now correctly identified as abstract when applied to an abstract method: class C(ABC): @staticmethod @abstractmethod def my_abstract_staticmethod(arg): ... @abc.abstractproperty Deprecated since version 3.3: It is now possible to use "property", "property.getter()", "property.setter()" and "property.deleter()" with "abstractmethod()", making this decorator redundant. A subclass of the built-in "property()", indicating an abstract property. This special case is deprecated, as the "property()" decorator is now correctly identified as abstract when applied to an abstract method: class C(ABC): @property @abstractmethod def my_abstract_property(self): ... The above example defines a read-only property; you can also define a read-write abstract property by appropriately marking one or more of the underlying methods as abstract: class C(ABC): @property def x(self): ... @x.setter @abstractmethod def x(self, val): ... If only some components are abstract, only those components need to be updated to create a concrete property in a subclass: class D(C): @C.x.setter def x(self, val): ... The "abc" module also provides the following functions: abc.get_cache_token() Returns the current abstract base class cache token. The token is an opaque object (that supports equality testing) identifying the current version of the abstract base class cache for virtual subclasses. The token changes with every call to "ABCMeta.register()" on any ABC. Added in version 3.4. abc.update_abstractmethods(cls) A function to recalculate an abstract class’s abstraction status. This function should be called if a class’s abstract methods have been implemented or changed after it was created. Usually, this function should be called from within a class decorator. Returns *cls*, to allow usage as a class decorator. If *cls* is not an instance of "ABCMeta", does nothing. Note: This function assumes that *cls*’s superclasses are already updated. It does not update any subclasses. Added in version 3.10. -[ Footnotes ]- [1] C++ programmers should note that Python’s virtual base class concept is not the same as C++’s. "aifc" — Read and write AIFF and AIFC files ******************************************* Deprecated since version 3.11, removed in version 3.13. This module is no longer part of the Python standard library. It was removed in Python 3.13 after being deprecated in Python 3.11. The removal was decided in **PEP 594**. The last version of Python that provided the "aifc" module was Python 3.12. Generic Operating System Services ********************************* The modules described in this chapter provide interfaces to operating system features that are available on (almost) all operating systems, such as files and a clock. The interfaces are generally modeled after the Unix or C interfaces, but they are available on most other systems as well. Here’s an overview: * "os" — Miscellaneous operating system interfaces * File Names, Command Line Arguments, and Environment Variables * Python UTF-8 Mode * Process Parameters * File Object Creation * File Descriptor Operations * Querying the size of a terminal * Inheritance of File Descriptors * Files and Directories * Timer File Descriptors * Linux extended attributes * Process Management * Interface to the scheduler * Miscellaneous System Information * Random numbers * "io" — Core tools for working with streams * Overview * Text I/O * Binary I/O * Raw I/O * Text Encoding * Opt-in EncodingWarning * High-level Module Interface * Class hierarchy * I/O Base Classes * Raw File I/O * Buffered Streams * Text I/O * Performance * Binary I/O * Text I/O * Multi-threading * Reentrancy * "time" — Time access and conversions * Functions * Clock ID Constants * Timezone Constants * "logging" — Logging facility for Python * Logger Objects * Logging Levels * Handler Objects * Formatter Objects * Filter Objects * LogRecord Objects * LogRecord attributes * LoggerAdapter Objects * Thread Safety * Module-Level Functions * Module-Level Attributes * Integration with the warnings module * "logging.config" — Logging configuration * Configuration functions * Security considerations * Configuration dictionary schema * Dictionary Schema Details * Incremental Configuration * Object connections * User-defined objects * Handler configuration order * Access to external objects * Access to internal objects * Import resolution and custom importers * Configuring QueueHandler and QueueListener * Configuration file format * "logging.handlers" — Logging handlers * StreamHandler * FileHandler * NullHandler * WatchedFileHandler * BaseRotatingHandler * RotatingFileHandler * TimedRotatingFileHandler * SocketHandler * DatagramHandler * SysLogHandler * NTEventLogHandler * SMTPHandler * MemoryHandler * HTTPHandler * QueueHandler * QueueListener * "platform" — Access to underlying platform’s identifying data * Cross platform * Java platform * Windows platform * macOS platform * iOS platform * Unix platforms * Linux platforms * Android platform * Command-line usage * "errno" — Standard errno system symbols * "ctypes" — A foreign function library for Python * ctypes tutorial * Loading dynamic link libraries * Accessing functions from loaded dlls * Calling functions * Fundamental data types * Calling functions, continued * Calling variadic functions * Calling functions with your own custom data types * Specifying the required argument types (function prototypes) * Return types * Passing pointers (or: passing parameters by reference) * Structures and unions * Structure/union alignment and byte order * Bit fields in structures and unions * Arrays * Pointers * Type conversions * Incomplete Types * Callback functions * Accessing values exported from dlls * Surprises * Variable-sized data types * ctypes reference * Finding shared libraries * Loading shared libraries * Foreign functions * Function prototypes * Utility functions * Data types * Fundamental data types * Structured data types * Arrays and pointers Data Compression and Archiving ****************************** The modules described in this chapter support data compression with the zlib, gzip, bzip2 and lzma algorithms, and the creation of ZIP- and tar-format archives. See also Archiving operations provided by the "shutil" module. * "zlib" — Compression compatible with **gzip** * "gzip" — Support for **gzip** files * Examples of usage * Command Line Interface * Command line options * "bz2" — Support for **bzip2** compression * (De)compression of files * Incremental (de)compression * One-shot (de)compression * Examples of usage * "lzma" — Compression using the LZMA algorithm * Reading and writing compressed files * Compressing and decompressing data in memory * Miscellaneous * Specifying custom filter chains * Examples * "zipfile" — Work with ZIP archives * ZipFile Objects * Path Objects * PyZipFile Objects * ZipInfo Objects * Command-Line Interface * Command-line options * Decompression pitfalls * From file itself * File System limitations * Resources limitations * Interruption * Default behaviors of extraction * "tarfile" — Read and write tar archive files * TarFile Objects * TarInfo Objects * Extraction filters * Default named filters * Filter errors * Hints for further verification * Supporting older Python versions * Stateful extraction filter example * Command-Line Interface * Command-line options * Examples * Supported tar formats * Unicode issues "argparse" — Parser for command-line options, arguments and subcommands *********************************************************************** Added in version 3.2. **Source code:** Lib/argparse.py Note: While "argparse" is the default recommended standard library module for implementing basic command line applications, authors with more exacting requirements for exactly how their command line applications behave may find it doesn’t provide the necessary level of control. Refer to Choosing an argument parsing library for alternatives to consider when "argparse" doesn’t support behaviors that the application requires (such as entirely disabling support for interspersed options and positional arguments, or accepting option parameter values that start with "-" even when they correspond to another defined option). ====================================================================== Tutorial ^^^^^^^^ This page contains the API reference information. For a more gentle introduction to Python command-line parsing, have a look at the argparse tutorial. The "argparse" module makes it easy to write user-friendly command- line interfaces. The program defines what arguments it requires, and "argparse" will figure out how to parse those out of "sys.argv". The "argparse" module also automatically generates help and usage messages. The module will also issue errors when users give the program invalid arguments. The "argparse" module’s support for command-line interfaces is built around an instance of "argparse.ArgumentParser". It is a container for argument specifications and has options that apply to the parser as whole: parser = argparse.ArgumentParser( prog='ProgramName', description='What the program does', epilog='Text at the bottom of help') The "ArgumentParser.add_argument()" method attaches individual argument specifications to the parser. It supports positional arguments, options that accept values, and on/off flags: parser.add_argument('filename') # positional argument parser.add_argument('-c', '--count') # option that takes a value parser.add_argument('-v', '--verbose', action='store_true') # on/off flag The "ArgumentParser.parse_args()" method runs the parser and places the extracted data in a "argparse.Namespace" object: args = parser.parse_args() print(args.filename, args.count, args.verbose) Note: If you’re looking for a guide about how to upgrade "optparse" code to "argparse", see Upgrading Optparse Code. ArgumentParser objects ====================== class argparse.ArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=argparse.HelpFormatter, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True) Create a new "ArgumentParser" object. All parameters should be passed as keyword arguments. Each parameter has its own more detailed description below, but in short they are: * prog - The name of the program (default: "os.path.basename(sys.argv[0])") * usage - The string describing the program usage (default: generated from arguments added to parser) * description - Text to display before the argument help (by default, no text) * epilog - Text to display after the argument help (by default, no text) * parents - A list of "ArgumentParser" objects whose arguments should also be included * formatter_class - A class for customizing the help output * prefix_chars - The set of characters that prefix optional arguments (default: ‘-‘) * fromfile_prefix_chars - The set of characters that prefix files from which additional arguments should be read (default: "None") * argument_default - The global default value for arguments (default: "None") * conflict_handler - The strategy for resolving conflicting optionals (usually unnecessary) * add_help - Add a "-h/--help" option to the parser (default: "True") * allow_abbrev - Allows long options to be abbreviated if the abbreviation is unambiguous. (default: "True") * exit_on_error - Determines whether or not "ArgumentParser" exits with error info when an error occurs. (default: "True") Changed in version 3.5: *allow_abbrev* parameter was added. Changed in version 3.8: In previous versions, *allow_abbrev* also disabled grouping of short flags such as "-vv" to mean "-v -v". Changed in version 3.9: *exit_on_error* parameter was added. The following sections describe how each of these are used. prog ---- By default, "ArgumentParser" calculates the name of the program to display in help messages depending on the way the Python interpreter was run: * The "base name" of "sys.argv[0]" if a file was passed as argument. * The Python interpreter name followed by "sys.argv[0]" if a directory or a zipfile was passed as argument. * The Python interpreter name followed by "-m" followed by the module or package name if the "-m" option was used. This default is almost always desirable because it will make the help messages match the string that was used to invoke the program on the command line. However, to change this default behavior, another value can be supplied using the "prog=" argument to "ArgumentParser": >>> parser = argparse.ArgumentParser(prog='myprogram') >>> parser.print_help() usage: myprogram [-h] options: -h, --help show this help message and exit Note that the program name, whether determined from "sys.argv[0]" or from the "prog=" argument, is available to help messages using the "%(prog)s" format specifier. >>> parser = argparse.ArgumentParser(prog='myprogram') >>> parser.add_argument('--foo', help='foo of the %(prog)s program') >>> parser.print_help() usage: myprogram [-h] [--foo FOO] options: -h, --help show this help message and exit --foo FOO foo of the myprogram program usage ----- By default, "ArgumentParser" calculates the usage message from the arguments it contains. The default message can be overridden with the "usage=" keyword argument: >>> parser = argparse.ArgumentParser(prog='PROG', usage='%(prog)s [options]') >>> parser.add_argument('--foo', nargs='?', help='foo help') >>> parser.add_argument('bar', nargs='+', help='bar help') >>> parser.print_help() usage: PROG [options] positional arguments: bar bar help options: -h, --help show this help message and exit --foo [FOO] foo help The "%(prog)s" format specifier is available to fill in the program name in your usage messages. description ----------- Most calls to the "ArgumentParser" constructor will use the "description=" keyword argument. This argument gives a brief description of what the program does and how it works. In help messages, the description is displayed between the command-line usage string and the help messages for the various arguments. By default, the description will be line-wrapped so that it fits within the given space. To change this behavior, see the formatter_class argument. epilog ------ Some programs like to display additional description of the program after the description of the arguments. Such text can be specified using the "epilog=" argument to "ArgumentParser": >>> parser = argparse.ArgumentParser( ... description='A foo that bars', ... epilog="And that's how you'd foo a bar") >>> parser.print_help() usage: argparse.py [-h] A foo that bars options: -h, --help show this help message and exit And that's how you'd foo a bar As with the description argument, the "epilog=" text is by default line-wrapped, but this behavior can be adjusted with the formatter_class argument to "ArgumentParser". parents ------- Sometimes, several parsers share a common set of arguments. Rather than repeating the definitions of these arguments, a single parser with all the shared arguments and passed to "parents=" argument to "ArgumentParser" can be used. The "parents=" argument takes a list of "ArgumentParser" objects, collects all the positional and optional actions from them, and adds these actions to the "ArgumentParser" object being constructed: >>> parent_parser = argparse.ArgumentParser(add_help=False) >>> parent_parser.add_argument('--parent', type=int) >>> foo_parser = argparse.ArgumentParser(parents=[parent_parser]) >>> foo_parser.add_argument('foo') >>> foo_parser.parse_args(['--parent', '2', 'XXX']) Namespace(foo='XXX', parent=2) >>> bar_parser = argparse.ArgumentParser(parents=[parent_parser]) >>> bar_parser.add_argument('--bar') >>> bar_parser.parse_args(['--bar', 'YYY']) Namespace(bar='YYY', parent=None) Note that most parent parsers will specify "add_help=False". Otherwise, the "ArgumentParser" will see two "-h/--help" options (one in the parent and one in the child) and raise an error. Note: You must fully initialize the parsers before passing them via "parents=". If you change the parent parsers after the child parser, those changes will not be reflected in the child. formatter_class --------------- "ArgumentParser" objects allow the help formatting to be customized by specifying an alternate formatting class. Currently, there are four such classes: class argparse.RawDescriptionHelpFormatter class argparse.RawTextHelpFormatter class argparse.ArgumentDefaultsHelpFormatter class argparse.MetavarTypeHelpFormatter "RawDescriptionHelpFormatter" and "RawTextHelpFormatter" give more control over how textual descriptions are displayed. By default, "ArgumentParser" objects line-wrap the description and epilog texts in command-line help messages: >>> parser = argparse.ArgumentParser( ... prog='PROG', ... description='''this description ... was indented weird ... but that is okay''', ... epilog=''' ... likewise for this epilog whose whitespace will ... be cleaned up and whose words will be wrapped ... across a couple lines''') >>> parser.print_help() usage: PROG [-h] this description was indented weird but that is okay options: -h, --help show this help message and exit likewise for this epilog whose whitespace will be cleaned up and whose words will be wrapped across a couple lines Passing "RawDescriptionHelpFormatter" as "formatter_class=" indicates that description and epilog are already correctly formatted and should not be line-wrapped: >>> parser = argparse.ArgumentParser( ... prog='PROG', ... formatter_class=argparse.RawDescriptionHelpFormatter, ... description=textwrap.dedent('''\ ... Please do not mess up this text! ... -------------------------------- ... I have indented it ... exactly the way ... I want it ... ''')) >>> parser.print_help() usage: PROG [-h] Please do not mess up this text! -------------------------------- I have indented it exactly the way I want it options: -h, --help show this help message and exit "RawTextHelpFormatter" maintains whitespace for all sorts of help text, including argument descriptions. However, multiple newlines are replaced with one. If you wish to preserve multiple blank lines, add spaces between the newlines. "ArgumentDefaultsHelpFormatter" automatically adds information about default values to each of the argument help messages: >>> parser = argparse.ArgumentParser( ... prog='PROG', ... formatter_class=argparse.ArgumentDefaultsHelpFormatter) >>> parser.add_argument('--foo', type=int, default=42, help='FOO!') >>> parser.add_argument('bar', nargs='*', default=[1, 2, 3], help='BAR!') >>> parser.print_help() usage: PROG [-h] [--foo FOO] [bar ...] positional arguments: bar BAR! (default: [1, 2, 3]) options: -h, --help show this help message and exit --foo FOO FOO! (default: 42) "MetavarTypeHelpFormatter" uses the name of the type argument for each argument as the display name for its values (rather than using the dest as the regular formatter does): >>> parser = argparse.ArgumentParser( ... prog='PROG', ... formatter_class=argparse.MetavarTypeHelpFormatter) >>> parser.add_argument('--foo', type=int) >>> parser.add_argument('bar', type=float) >>> parser.print_help() usage: PROG [-h] [--foo int] float positional arguments: float options: -h, --help show this help message and exit --foo int prefix_chars ------------ Most command-line options will use "-" as the prefix, e.g. "-f/--foo". Parsers that need to support different or additional prefix characters, e.g. for options like "+f" or "/foo", may specify them using the "prefix_chars=" argument to the "ArgumentParser" constructor: >>> parser = argparse.ArgumentParser(prog='PROG', prefix_chars='-+') >>> parser.add_argument('+f') >>> parser.add_argument('++bar') >>> parser.parse_args('+f X ++bar Y'.split()) Namespace(bar='Y', f='X') The "prefix_chars=" argument defaults to "'-'". Supplying a set of characters that does not include "-" will cause "-f/--foo" options to be disallowed. fromfile_prefix_chars --------------------- Sometimes, when dealing with a particularly long argument list, it may make sense to keep the list of arguments in a file rather than typing it out at the command line. If the "fromfile_prefix_chars=" argument is given to the "ArgumentParser" constructor, then arguments that start with any of the specified characters will be treated as files, and will be replaced by the arguments they contain. For example: >>> with open('args.txt', 'w', encoding=sys.getfilesystemencoding()) as fp: ... fp.write('-f\nbar') ... >>> parser = argparse.ArgumentParser(fromfile_prefix_chars='@') >>> parser.add_argument('-f') >>> parser.parse_args(['-f', 'foo', '@args.txt']) Namespace(f='bar') Arguments read from a file must by default be one per line (but see also "convert_arg_line_to_args()") and are treated as if they were in the same place as the original file referencing argument on the command line. So in the example above, the expression "['-f', 'foo', '@args.txt']" is considered equivalent to the expression "['-f', 'foo', '-f', 'bar']". "ArgumentParser" uses *filesystem encoding and error handler* to read the file containing arguments. The "fromfile_prefix_chars=" argument defaults to "None", meaning that arguments will never be treated as file references. Changed in version 3.12: "ArgumentParser" changed encoding and errors to read arguments files from default (e.g. "locale.getpreferredencoding(False)" and ""strict"") to the *filesystem encoding and error handler*. Arguments file should be encoded in UTF-8 instead of ANSI Codepage on Windows. argument_default ---------------- Generally, argument defaults are specified either by passing a default to "add_argument()" or by calling the "set_defaults()" methods with a specific set of name-value pairs. Sometimes however, it may be useful to specify a single parser-wide default for arguments. This can be accomplished by passing the "argument_default=" keyword argument to "ArgumentParser". For example, to globally suppress attribute creation on "parse_args()" calls, we supply "argument_default=SUPPRESS": >>> parser = argparse.ArgumentParser(argument_default=argparse.SUPPRESS) >>> parser.add_argument('--foo') >>> parser.add_argument('bar', nargs='?') >>> parser.parse_args(['--foo', '1', 'BAR']) Namespace(bar='BAR', foo='1') >>> parser.parse_args([]) Namespace() allow_abbrev ------------ Normally, when you pass an argument list to the "parse_args()" method of an "ArgumentParser", it recognizes abbreviations of long options. This feature can be disabled by setting "allow_abbrev" to "False": >>> parser = argparse.ArgumentParser(prog='PROG', allow_abbrev=False) >>> parser.add_argument('--foobar', action='store_true') >>> parser.add_argument('--foonley', action='store_false') >>> parser.parse_args(['--foon']) usage: PROG [-h] [--foobar] [--foonley] PROG: error: unrecognized arguments: --foon Added in version 3.5. conflict_handler ---------------- "ArgumentParser" objects do not allow two actions with the same option string. By default, "ArgumentParser" objects raise an exception if an attempt is made to create an argument with an option string that is already in use: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-f', '--foo', help='old foo help') >>> parser.add_argument('--foo', help='new foo help') Traceback (most recent call last): .. ArgumentError: argument --foo: conflicting option string(s): --foo Sometimes (e.g. when using parents) it may be useful to simply override any older arguments with the same option string. To get this behavior, the value "'resolve'" can be supplied to the "conflict_handler=" argument of "ArgumentParser": >>> parser = argparse.ArgumentParser(prog='PROG', conflict_handler='resolve') >>> parser.add_argument('-f', '--foo', help='old foo help') >>> parser.add_argument('--foo', help='new foo help') >>> parser.print_help() usage: PROG [-h] [-f FOO] [--foo FOO] options: -h, --help show this help message and exit -f FOO old foo help --foo FOO new foo help Note that "ArgumentParser" objects only remove an action if all of its option strings are overridden. So, in the example above, the old "-f/--foo" action is retained as the "-f" action, because only the "-- foo" option string was overridden. add_help -------- By default, "ArgumentParser" objects add an option which simply displays the parser’s help message. If "-h" or "--help" is supplied at the command line, the "ArgumentParser" help will be printed. Occasionally, it may be useful to disable the addition of this help option. This can be achieved by passing "False" as the "add_help=" argument to "ArgumentParser": >>> parser = argparse.ArgumentParser(prog='PROG', add_help=False) >>> parser.add_argument('--foo', help='foo help') >>> parser.print_help() usage: PROG [--foo FOO] options: --foo FOO foo help The help option is typically "-h/--help". The exception to this is if the "prefix_chars=" is specified and does not include "-", in which case "-h" and "--help" are not valid options. In this case, the first character in "prefix_chars" is used to prefix the help options: >>> parser = argparse.ArgumentParser(prog='PROG', prefix_chars='+/') >>> parser.print_help() usage: PROG [+h] options: +h, ++help show this help message and exit exit_on_error ------------- Normally, when you pass an invalid argument list to the "parse_args()" method of an "ArgumentParser", it will print a *message* to "sys.stderr" and exit with a status code of 2. If the user would like to catch errors manually, the feature can be enabled by setting "exit_on_error" to "False": >>> parser = argparse.ArgumentParser(exit_on_error=False) >>> parser.add_argument('--integers', type=int) _StoreAction(option_strings=['--integers'], dest='integers', nargs=None, const=None, default=None, type=, choices=None, help=None, metavar=None) >>> try: ... parser.parse_args('--integers a'.split()) ... except argparse.ArgumentError: ... print('Catching an argumentError') ... Catching an argumentError Added in version 3.9. The add_argument() method ========================= ArgumentParser.add_argument(name or flags..., *[, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest][, deprecated]) Define how a single command-line argument should be parsed. Each parameter has its own more detailed description below, but in short they are: * name or flags - Either a name or a list of option strings, e.g. "'foo'" or "'-f', '--foo'". * action - The basic type of action to be taken when this argument is encountered at the command line. * nargs - The number of command-line arguments that should be consumed. * const - A constant value required by some action and nargs selections. * default - The value produced if the argument is absent from the command line and if it is absent from the namespace object. * type - The type to which the command-line argument should be converted. * choices - A sequence of the allowable values for the argument. * required - Whether or not the command-line option may be omitted (optionals only). * help - A brief description of what the argument does. * metavar - A name for the argument in usage messages. * dest - The name of the attribute to be added to the object returned by "parse_args()". * deprecated - Whether or not use of the argument is deprecated. The following sections describe how each of these are used. name or flags ------------- The "add_argument()" method must know whether an optional argument, like "-f" or "--foo", or a positional argument, like a list of filenames, is expected. The first arguments passed to "add_argument()" must therefore be either a series of flags, or a simple argument name. For example, an optional argument could be created like: >>> parser.add_argument('-f', '--foo') while a positional argument could be created like: >>> parser.add_argument('bar') When "parse_args()" is called, optional arguments will be identified by the "-" prefix, and the remaining arguments will be assumed to be positional: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-f', '--foo') >>> parser.add_argument('bar') >>> parser.parse_args(['BAR']) Namespace(bar='BAR', foo=None) >>> parser.parse_args(['BAR', '--foo', 'FOO']) Namespace(bar='BAR', foo='FOO') >>> parser.parse_args(['--foo', 'FOO']) usage: PROG [-h] [-f FOO] bar PROG: error: the following arguments are required: bar action ------ "ArgumentParser" objects associate command-line arguments with actions. These actions can do just about anything with the command- line arguments associated with them, though most actions simply add an attribute to the object returned by "parse_args()". The "action" keyword argument specifies how the command-line arguments should be handled. The supplied actions are: * "'store'" - This just stores the argument’s value. This is the default action. * "'store_const'" - This stores the value specified by the const keyword argument; note that the const keyword argument defaults to "None". The "'store_const'" action is most commonly used with optional arguments that specify some sort of flag. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='store_const', const=42) >>> parser.parse_args(['--foo']) Namespace(foo=42) * "'store_true'" and "'store_false'" - These are special cases of "'store_const'" used for storing the values "True" and "False" respectively. In addition, they create default values of "False" and "True" respectively: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='store_true') >>> parser.add_argument('--bar', action='store_false') >>> parser.add_argument('--baz', action='store_false') >>> parser.parse_args('--foo --bar'.split()) Namespace(foo=True, bar=False, baz=True) * "'append'" - This stores a list, and appends each argument value to the list. It is useful to allow an option to be specified multiple times. If the default value is non-empty, the default elements will be present in the parsed value for the option, with any values from the command line appended after those default values. Example usage: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='append') >>> parser.parse_args('--foo 1 --foo 2'.split()) Namespace(foo=['1', '2']) * "'append_const'" - This stores a list, and appends the value specified by the const keyword argument to the list; note that the const keyword argument defaults to "None". The "'append_const'" action is typically useful when multiple arguments need to store constants to the same list. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--str', dest='types', action='append_const', const=str) >>> parser.add_argument('--int', dest='types', action='append_const', const=int) >>> parser.parse_args('--str --int'.split()) Namespace(types=[, ]) * "'extend'" - This stores a list and appends each item from the multi-value argument list to it. The "'extend'" action is typically used with the nargs keyword argument value "'+'" or "'*'". Note that when nargs is "None" (the default) or "'?'", each character of the argument string will be appended to the list. Example usage: >>> parser = argparse.ArgumentParser() >>> parser.add_argument("--foo", action="extend", nargs="+", type=str) >>> parser.parse_args(["--foo", "f1", "--foo", "f2", "f3", "f4"]) Namespace(foo=['f1', 'f2', 'f3', 'f4']) Added in version 3.8. * "'count'" - This counts the number of times a keyword argument occurs. For example, this is useful for increasing verbosity levels: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--verbose', '-v', action='count', default=0) >>> parser.parse_args(['-vvv']) Namespace(verbose=3) Note, the *default* will be "None" unless explicitly set to *0*. * "'help'" - This prints a complete help message for all the options in the current parser and then exits. By default a help action is automatically added to the parser. See "ArgumentParser" for details of how the output is created. * "'version'" - This expects a "version=" keyword argument in the "add_argument()" call, and prints version information and exits when invoked: >>> import argparse >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('--version', action='version', version='%(prog)s 2.0') >>> parser.parse_args(['--version']) PROG 2.0 Only actions that consume command-line arguments (e.g. "'store'", "'append'" or "'extend'") can be used with positional arguments. class argparse.BooleanOptionalAction You may also specify an arbitrary action by passing an "Action" subclass or other object that implements the same interface. The "BooleanOptionalAction" is available in "argparse" and adds support for boolean actions such as "--foo" and "--no-foo": >>> import argparse >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action=argparse.BooleanOptionalAction) >>> parser.parse_args(['--no-foo']) Namespace(foo=False) Added in version 3.9. The recommended way to create a custom action is to extend "Action", overriding the "__call__()" method and optionally the "__init__()" and "format_usage()" methods. You can also register custom actions using the "register()" method and reference them by their registered name. An example of a custom action: >>> class FooAction(argparse.Action): ... def __init__(self, option_strings, dest, nargs=None, **kwargs): ... if nargs is not None: ... raise ValueError("nargs not allowed") ... super().__init__(option_strings, dest, **kwargs) ... def __call__(self, parser, namespace, values, option_string=None): ... print('%r %r %r' % (namespace, values, option_string)) ... setattr(namespace, self.dest, values) ... >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action=FooAction) >>> parser.add_argument('bar', action=FooAction) >>> args = parser.parse_args('1 --foo 2'.split()) Namespace(bar=None, foo=None) '1' None Namespace(bar='1', foo=None) '2' '--foo' >>> args Namespace(bar='1', foo='2') For more details, see "Action". nargs ----- "ArgumentParser" objects usually associate a single command-line argument with a single action to be taken. The "nargs" keyword argument associates a different number of command-line arguments with a single action. See also Specifying ambiguous arguments. The supported values are: * "N" (an integer). "N" arguments from the command line will be gathered together into a list. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', nargs=2) >>> parser.add_argument('bar', nargs=1) >>> parser.parse_args('c --foo a b'.split()) Namespace(bar=['c'], foo=['a', 'b']) Note that "nargs=1" produces a list of one item. This is different from the default, in which the item is produced by itself. * "'?'". One argument will be consumed from the command line if possible, and produced as a single item. If no command-line argument is present, the value from default will be produced. Note that for optional arguments, there is an additional case - the option string is present but not followed by a command-line argument. In this case the value from const will be produced. Some examples to illustrate this: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', nargs='?', const='c', default='d') >>> parser.add_argument('bar', nargs='?', default='d') >>> parser.parse_args(['XX', '--foo', 'YY']) Namespace(bar='XX', foo='YY') >>> parser.parse_args(['XX', '--foo']) Namespace(bar='XX', foo='c') >>> parser.parse_args([]) Namespace(bar='d', foo='d') One of the more common uses of "nargs='?'" is to allow optional input and output files: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('infile', nargs='?', type=argparse.FileType('r'), ... default=sys.stdin) >>> parser.add_argument('outfile', nargs='?', type=argparse.FileType('w'), ... default=sys.stdout) >>> parser.parse_args(['input.txt', 'output.txt']) Namespace(infile=<_io.TextIOWrapper name='input.txt' encoding='UTF-8'>, outfile=<_io.TextIOWrapper name='output.txt' encoding='UTF-8'>) >>> parser.parse_args([]) Namespace(infile=<_io.TextIOWrapper name='' encoding='UTF-8'>, outfile=<_io.TextIOWrapper name='' encoding='UTF-8'>) * "'*'". All command-line arguments present are gathered into a list. Note that it generally doesn’t make much sense to have more than one positional argument with "nargs='*'", but multiple optional arguments with "nargs='*'" is possible. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', nargs='*') >>> parser.add_argument('--bar', nargs='*') >>> parser.add_argument('baz', nargs='*') >>> parser.parse_args('a b --foo x y --bar 1 2'.split()) Namespace(bar=['1', '2'], baz=['a', 'b'], foo=['x', 'y']) * "'+'". Just like "'*'", all command-line args present are gathered into a list. Additionally, an error message will be generated if there wasn’t at least one command-line argument present. For example: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('foo', nargs='+') >>> parser.parse_args(['a', 'b']) Namespace(foo=['a', 'b']) >>> parser.parse_args([]) usage: PROG [-h] foo [foo ...] PROG: error: the following arguments are required: foo If the "nargs" keyword argument is not provided, the number of arguments consumed is determined by the action. Generally this means a single command-line argument will be consumed and a single item (not a list) will be produced. Actions that do not consume command-line arguments (e.g. "'store_const'") set "nargs=0". const ----- The "const" argument of "add_argument()" is used to hold constant values that are not read from the command line but are required for the various "ArgumentParser" actions. The two most common uses of it are: * When "add_argument()" is called with "action='store_const'" or "action='append_const'". These actions add the "const" value to one of the attributes of the object returned by "parse_args()". See the action description for examples. If "const" is not provided to "add_argument()", it will receive a default value of "None". * When "add_argument()" is called with option strings (like "-f" or " --foo") and "nargs='?'". This creates an optional argument that can be followed by zero or one command-line arguments. When parsing the command line, if the option string is encountered with no command- line argument following it, the value of "const" will be assumed to be "None" instead. See the nargs description for examples. Changed in version 3.11: "const=None" by default, including when "action='append_const'" or "action='store_const'". default ------- All optional arguments and some positional arguments may be omitted at the command line. The "default" keyword argument of "add_argument()", whose value defaults to "None", specifies what value should be used if the command-line argument is not present. For optional arguments, the "default" value is used when the option string was not present at the command line: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default=42) >>> parser.parse_args(['--foo', '2']) Namespace(foo='2') >>> parser.parse_args([]) Namespace(foo=42) If the target namespace already has an attribute set, the action *default* will not overwrite it: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default=42) >>> parser.parse_args([], namespace=argparse.Namespace(foo=101)) Namespace(foo=101) If the "default" value is a string, the parser parses the value as if it were a command-line argument. In particular, the parser applies any type conversion argument, if provided, before setting the attribute on the "Namespace" return value. Otherwise, the parser uses the value as is: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--length', default='10', type=int) >>> parser.add_argument('--width', default=10.5, type=int) >>> parser.parse_args() Namespace(length=10, width=10.5) For positional arguments with nargs equal to "?" or "*", the "default" value is used when no command-line argument was present: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('foo', nargs='?', default=42) >>> parser.parse_args(['a']) Namespace(foo='a') >>> parser.parse_args([]) Namespace(foo=42) For required arguments, the "default" value is ignored. For example, this applies to positional arguments with nargs values other than "?" or "*", or optional arguments marked as "required=True". Providing "default=argparse.SUPPRESS" causes no attribute to be added if the command-line argument was not present: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default=argparse.SUPPRESS) >>> parser.parse_args([]) Namespace() >>> parser.parse_args(['--foo', '1']) Namespace(foo='1') type ---- By default, the parser reads command-line arguments in as simple strings. However, quite often the command-line string should instead be interpreted as another type, such as a "float" or "int". The "type" keyword for "add_argument()" allows any necessary type-checking and type conversions to be performed. If the type keyword is used with the default keyword, the type converter is only applied if the default is a string. The argument to "type" can be a callable that accepts a single string or the name of a registered type (see "register()") If the function raises "ArgumentTypeError", "TypeError", or "ValueError", the exception is caught and a nicely formatted error message is displayed. Other exception types are not handled. Common built-in types and functions can be used as type converters: import argparse import pathlib parser = argparse.ArgumentParser() parser.add_argument('count', type=int) parser.add_argument('distance', type=float) parser.add_argument('street', type=ascii) parser.add_argument('code_point', type=ord) parser.add_argument('dest_file', type=argparse.FileType('w', encoding='latin-1')) parser.add_argument('datapath', type=pathlib.Path) User defined functions can be used as well: >>> def hyphenated(string): ... return '-'.join([word[:4] for word in string.casefold().split()]) ... >>> parser = argparse.ArgumentParser() >>> _ = parser.add_argument('short_title', type=hyphenated) >>> parser.parse_args(['"The Tale of Two Cities"']) Namespace(short_title='"the-tale-of-two-citi') The "bool()" function is not recommended as a type converter. All it does is convert empty strings to "False" and non-empty strings to "True". This is usually not what is desired. In general, the "type" keyword is a convenience that should only be used for simple conversions that can only raise one of the three supported exceptions. Anything with more interesting error-handling or resource management should be done downstream after the arguments are parsed. For example, JSON or YAML conversions have complex error cases that require better reporting than can be given by the "type" keyword. A "JSONDecodeError" would not be well formatted and a "FileNotFoundError" exception would not be handled at all. Even "FileType" has its limitations for use with the "type" keyword. If one argument uses "FileType" and then a subsequent argument fails, an error is reported but the file is not automatically closed. In this case, it would be better to wait until after the parser has run and then use the "with"-statement to manage the files. For type checkers that simply check against a fixed set of values, consider using the choices keyword instead. choices ------- Some command-line arguments should be selected from a restricted set of values. These can be handled by passing a sequence object as the *choices* keyword argument to "add_argument()". When the command line is parsed, argument values will be checked, and an error message will be displayed if the argument was not one of the acceptable values: >>> parser = argparse.ArgumentParser(prog='game.py') >>> parser.add_argument('move', choices=['rock', 'paper', 'scissors']) >>> parser.parse_args(['rock']) Namespace(move='rock') >>> parser.parse_args(['fire']) usage: game.py [-h] {rock,paper,scissors} game.py: error: argument move: invalid choice: 'fire' (choose from 'rock', 'paper', 'scissors') Note that inclusion in the *choices* sequence is checked after any type conversions have been performed, so the type of the objects in the *choices* sequence should match the type specified. Any sequence can be passed as the *choices* value, so "list" objects, "tuple" objects, and custom sequences are all supported. Use of "enum.Enum" is not recommended because it is difficult to control its appearance in usage, help, and error messages. Formatted choices override the default *metavar* which is normally derived from *dest*. This is usually what you want because the user never sees the *dest* parameter. If this display isn’t desirable (perhaps because there are many choices), just specify an explicit metavar. required -------- In general, the "argparse" module assumes that flags like "-f" and "-- bar" indicate *optional* arguments, which can always be omitted at the command line. To make an option *required*, "True" can be specified for the "required=" keyword argument to "add_argument()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', required=True) >>> parser.parse_args(['--foo', 'BAR']) Namespace(foo='BAR') >>> parser.parse_args([]) usage: [-h] --foo FOO : error: the following arguments are required: --foo As the example shows, if an option is marked as "required", "parse_args()" will report an error if that option is not present at the command line. Note: Required options are generally considered bad form because users expect *options* to be *optional*, and thus they should be avoided when possible. help ---- The "help" value is a string containing a brief description of the argument. When a user requests help (usually by using "-h" or "--help" at the command line), these "help" descriptions will be displayed with each argument. The "help" strings can include various format specifiers to avoid repetition of things like the program name or the argument default. The available specifiers include the program name, "%(prog)s" and most keyword arguments to "add_argument()", e.g. "%(default)s", "%(type)s", etc.: >>> parser = argparse.ArgumentParser(prog='frobble') >>> parser.add_argument('bar', nargs='?', type=int, default=42, ... help='the bar to %(prog)s (default: %(default)s)') >>> parser.print_help() usage: frobble [-h] [bar] positional arguments: bar the bar to frobble (default: 42) options: -h, --help show this help message and exit As the help string supports %-formatting, if you want a literal "%" to appear in the help string, you must escape it as "%%". "argparse" supports silencing the help entry for certain options, by setting the "help" value to "argparse.SUPPRESS": >>> parser = argparse.ArgumentParser(prog='frobble') >>> parser.add_argument('--foo', help=argparse.SUPPRESS) >>> parser.print_help() usage: frobble [-h] options: -h, --help show this help message and exit metavar ------- When "ArgumentParser" generates help messages, it needs some way to refer to each expected argument. By default, "ArgumentParser" objects use the dest value as the “name” of each object. By default, for positional argument actions, the dest value is used directly, and for optional argument actions, the dest value is uppercased. So, a single positional argument with "dest='bar'" will be referred to as "bar". A single optional argument "--foo" that should be followed by a single command-line argument will be referred to as "FOO". An example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> parser.add_argument('bar') >>> parser.parse_args('X --foo Y'.split()) Namespace(bar='X', foo='Y') >>> parser.print_help() usage: [-h] [--foo FOO] bar positional arguments: bar options: -h, --help show this help message and exit --foo FOO An alternative name can be specified with "metavar": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', metavar='YYY') >>> parser.add_argument('bar', metavar='XXX') >>> parser.parse_args('X --foo Y'.split()) Namespace(bar='X', foo='Y') >>> parser.print_help() usage: [-h] [--foo YYY] XXX positional arguments: XXX options: -h, --help show this help message and exit --foo YYY Note that "metavar" only changes the *displayed* name - the name of the attribute on the "parse_args()" object is still determined by the dest value. Different values of "nargs" may cause the metavar to be used multiple times. Providing a tuple to "metavar" specifies a different display for each of the arguments: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x', nargs=2) >>> parser.add_argument('--foo', nargs=2, metavar=('bar', 'baz')) >>> parser.print_help() usage: PROG [-h] [-x X X] [--foo bar baz] options: -h, --help show this help message and exit -x X X --foo bar baz dest ---- Most "ArgumentParser" actions add some value as an attribute of the object returned by "parse_args()". The name of this attribute is determined by the "dest" keyword argument of "add_argument()". For positional argument actions, "dest" is normally supplied as the first argument to "add_argument()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('bar') >>> parser.parse_args(['XXX']) Namespace(bar='XXX') For optional argument actions, the value of "dest" is normally inferred from the option strings. "ArgumentParser" generates the value of "dest" by taking the first long option string and stripping away the initial "--" string. If no long option strings were supplied, "dest" will be derived from the first short option string by stripping the initial "-" character. Any internal "-" characters will be converted to "_" characters to make sure the string is a valid attribute name. The examples below illustrate this behavior: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('-f', '--foo-bar', '--foo') >>> parser.add_argument('-x', '-y') >>> parser.parse_args('-f 1 -x 2'.split()) Namespace(foo_bar='1', x='2') >>> parser.parse_args('--foo 1 -y 2'.split()) Namespace(foo_bar='1', x='2') "dest" allows a custom attribute name to be provided: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', dest='bar') >>> parser.parse_args('--foo XXX'.split()) Namespace(bar='XXX') deprecated ---------- During a project’s lifetime, some arguments may need to be removed from the command line. Before removing them, you should inform your users that the arguments are deprecated and will be removed. The "deprecated" keyword argument of "add_argument()", which defaults to "False", specifies if the argument is deprecated and will be removed in the future. For arguments, if "deprecated" is "True", then a warning will be printed to "sys.stderr" when the argument is used: >>> import argparse >>> parser = argparse.ArgumentParser(prog='snake.py') >>> parser.add_argument('--legs', default=0, type=int, deprecated=True) >>> parser.parse_args([]) Namespace(legs=0) >>> parser.parse_args(['--legs', '4']) snake.py: warning: option '--legs' is deprecated Namespace(legs=4) Added in version 3.13. Action classes -------------- "Action" classes implement the Action API, a callable which returns a callable which processes arguments from the command-line. Any object which follows this API may be passed as the "action" parameter to "add_argument()". class argparse.Action(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None) "Action" objects are used by an "ArgumentParser" to represent the information needed to parse a single argument from one or more strings from the command line. The "Action" class must accept the two positional arguments plus any keyword arguments passed to "ArgumentParser.add_argument()" except for the "action" itself. Instances of "Action" (or return value of any callable to the "action" parameter) should have attributes "dest", "option_strings", "default", "type", "required", "help", etc. defined. The easiest way to ensure these attributes are defined is to call "Action.__init__()". __call__(parser, namespace, values, option_string=None) "Action" instances should be callable, so subclasses must override the "__call__()" method, which should accept four parameters: * *parser* - The "ArgumentParser" object which contains this action. * *namespace* - The "Namespace" object that will be returned by "parse_args()". Most actions add an attribute to this object using "setattr()". * *values* - The associated command-line arguments, with any type conversions applied. Type conversions are specified with the type keyword argument to "add_argument()". * *option_string* - The option string that was used to invoke this action. The "option_string" argument is optional, and will be absent if the action is associated with a positional argument. The "__call__()" method may perform arbitrary actions, but will typically set attributes on the "namespace" based on "dest" and "values". format_usage() "Action" subclasses can define a "format_usage()" method that takes no argument and return a string which will be used when printing the usage of the program. If such method is not provided, a sensible default will be used. The parse_args() method ======================= ArgumentParser.parse_args(args=None, namespace=None) Convert argument strings to objects and assign them as attributes of the namespace. Return the populated namespace. Previous calls to "add_argument()" determine exactly what objects are created and how they are assigned. See the documentation for "add_argument()" for details. * args - List of strings to parse. The default is taken from "sys.argv". * namespace - An object to take the attributes. The default is a new empty "Namespace" object. Option value syntax ------------------- The "parse_args()" method supports several ways of specifying the value of an option (if it takes one). In the simplest case, the option and its value are passed as two separate arguments: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x') >>> parser.add_argument('--foo') >>> parser.parse_args(['-x', 'X']) Namespace(foo=None, x='X') >>> parser.parse_args(['--foo', 'FOO']) Namespace(foo='FOO', x=None) For long options (options with names longer than a single character), the option and value can also be passed as a single command-line argument, using "=" to separate them: >>> parser.parse_args(['--foo=FOO']) Namespace(foo='FOO', x=None) For short options (options only one character long), the option and its value can be concatenated: >>> parser.parse_args(['-xX']) Namespace(foo=None, x='X') Several short options can be joined together, using only a single "-" prefix, as long as only the last option (or none of them) requires a value: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x', action='store_true') >>> parser.add_argument('-y', action='store_true') >>> parser.add_argument('-z') >>> parser.parse_args(['-xyzZ']) Namespace(x=True, y=True, z='Z') Invalid arguments ----------------- While parsing the command line, "parse_args()" checks for a variety of errors, including ambiguous options, invalid types, invalid options, wrong number of positional arguments, etc. When it encounters such an error, it exits and prints the error along with a usage message: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('--foo', type=int) >>> parser.add_argument('bar', nargs='?') >>> # invalid type >>> parser.parse_args(['--foo', 'spam']) usage: PROG [-h] [--foo FOO] [bar] PROG: error: argument --foo: invalid int value: 'spam' >>> # invalid option >>> parser.parse_args(['--bar']) usage: PROG [-h] [--foo FOO] [bar] PROG: error: no such option: --bar >>> # wrong number of arguments >>> parser.parse_args(['spam', 'badger']) usage: PROG [-h] [--foo FOO] [bar] PROG: error: extra arguments found: badger Arguments containing "-" ------------------------ The "parse_args()" method attempts to give errors whenever the user has clearly made a mistake, but some situations are inherently ambiguous. For example, the command-line argument "-1" could either be an attempt to specify an option or an attempt to provide a positional argument. The "parse_args()" method is cautious here: positional arguments may only begin with "-" if they look like negative numbers and there are no options in the parser that look like negative numbers: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x') >>> parser.add_argument('foo', nargs='?') >>> # no negative number options, so -1 is a positional argument >>> parser.parse_args(['-x', '-1']) Namespace(foo=None, x='-1') >>> # no negative number options, so -1 and -5 are positional arguments >>> parser.parse_args(['-x', '-1', '-5']) Namespace(foo='-5', x='-1') >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-1', dest='one') >>> parser.add_argument('foo', nargs='?') >>> # negative number options present, so -1 is an option >>> parser.parse_args(['-1', 'X']) Namespace(foo=None, one='X') >>> # negative number options present, so -2 is an option >>> parser.parse_args(['-2']) usage: PROG [-h] [-1 ONE] [foo] PROG: error: no such option: -2 >>> # negative number options present, so both -1s are options >>> parser.parse_args(['-1', '-1']) usage: PROG [-h] [-1 ONE] [foo] PROG: error: argument -1: expected one argument If you have positional arguments that must begin with "-" and don’t look like negative numbers, you can insert the pseudo-argument "'--'" which tells "parse_args()" that everything after that is a positional argument: >>> parser.parse_args(['--', '-f']) Namespace(foo='-f', one=None) See also the argparse howto on ambiguous arguments for more details. Argument abbreviations (prefix matching) ---------------------------------------- The "parse_args()" method by default allows long options to be abbreviated to a prefix, if the abbreviation is unambiguous (the prefix matches a unique option): >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-bacon') >>> parser.add_argument('-badger') >>> parser.parse_args('-bac MMM'.split()) Namespace(bacon='MMM', badger=None) >>> parser.parse_args('-bad WOOD'.split()) Namespace(bacon=None, badger='WOOD') >>> parser.parse_args('-ba BA'.split()) usage: PROG [-h] [-bacon BACON] [-badger BADGER] PROG: error: ambiguous option: -ba could match -badger, -bacon An error is produced for arguments that could produce more than one options. This feature can be disabled by setting allow_abbrev to "False". Beyond "sys.argv" ----------------- Sometimes it may be useful to have an "ArgumentParser" parse arguments other than those of "sys.argv". This can be accomplished by passing a list of strings to "parse_args()". This is useful for testing at the interactive prompt: >>> parser = argparse.ArgumentParser() >>> parser.add_argument( ... 'integers', metavar='int', type=int, choices=range(10), ... nargs='+', help='an integer in the range 0..9') >>> parser.add_argument( ... '--sum', dest='accumulate', action='store_const', const=sum, ... default=max, help='sum the integers (default: find the max)') >>> parser.parse_args(['1', '2', '3', '4']) Namespace(accumulate=, integers=[1, 2, 3, 4]) >>> parser.parse_args(['1', '2', '3', '4', '--sum']) Namespace(accumulate=, integers=[1, 2, 3, 4]) The Namespace object -------------------- class argparse.Namespace Simple class used by default by "parse_args()" to create an object holding attributes and return it. This class is deliberately simple, just an "object" subclass with a readable string representation. If you prefer to have dict-like view of the attributes, you can use the standard Python idiom, "vars()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> args = parser.parse_args(['--foo', 'BAR']) >>> vars(args) {'foo': 'BAR'} It may also be useful to have an "ArgumentParser" assign attributes to an already existing object, rather than a new "Namespace" object. This can be achieved by specifying the "namespace=" keyword argument: >>> class C: ... pass ... >>> c = C() >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> parser.parse_args(args=['--foo', 'BAR'], namespace=c) >>> c.foo 'BAR' Other utilities =============== Sub-commands ------------ ArgumentParser.add_subparsers(*[, title][, description][, prog][, parser_class][, action][, dest][, required][, help][, metavar]) Many programs split up their functionality into a number of subcommands, for example, the "svn" program can invoke subcommands like "svn checkout", "svn update", and "svn commit". Splitting up functionality this way can be a particularly good idea when a program performs several different functions which require different kinds of command-line arguments. "ArgumentParser" supports the creation of such subcommands with the "add_subparsers()" method. The "add_subparsers()" method is normally called with no arguments and returns a special action object. This object has a single method, "add_parser()", which takes a command name and any "ArgumentParser" constructor arguments, and returns an "ArgumentParser" object that can be modified as usual. Description of parameters: * *title* - title for the sub-parser group in help output; by default “subcommands” if description is provided, otherwise uses title for positional arguments * *description* - description for the sub-parser group in help output, by default "None" * *prog* - usage information that will be displayed with sub- command help, by default the name of the program and any positional arguments before the subparser argument * *parser_class* - class which will be used to create sub-parser instances, by default the class of the current parser (e.g. "ArgumentParser") * action - the basic type of action to be taken when this argument is encountered at the command line * dest - name of the attribute under which sub-command name will be stored; by default "None" and no value is stored * required - Whether or not a subcommand must be provided, by default "False" (added in 3.7) * help - help for sub-parser group in help output, by default "None" * metavar - string presenting available subcommands in help; by default it is "None" and presents subcommands in form {cmd1, cmd2, ..} Some example usage: >>> # create the top-level parser >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('--foo', action='store_true', help='foo help') >>> subparsers = parser.add_subparsers(help='subcommand help') >>> >>> # create the parser for the "a" command >>> parser_a = subparsers.add_parser('a', help='a help') >>> parser_a.add_argument('bar', type=int, help='bar help') >>> >>> # create the parser for the "b" command >>> parser_b = subparsers.add_parser('b', help='b help') >>> parser_b.add_argument('--baz', choices=('X', 'Y', 'Z'), help='baz help') >>> >>> # parse some argument lists >>> parser.parse_args(['a', '12']) Namespace(bar=12, foo=False) >>> parser.parse_args(['--foo', 'b', '--baz', 'Z']) Namespace(baz='Z', foo=True) Note that the object returned by "parse_args()" will only contain attributes for the main parser and the subparser that was selected by the command line (and not any other subparsers). So in the example above, when the "a" command is specified, only the "foo" and "bar" attributes are present, and when the "b" command is specified, only the "foo" and "baz" attributes are present. Similarly, when a help message is requested from a subparser, only the help for that particular parser will be printed. The help message will not include parent parser or sibling parser messages. (A help message for each subparser command, however, can be given by supplying the "help=" argument to "add_parser()" as above.) >>> parser.parse_args(['--help']) usage: PROG [-h] [--foo] {a,b} ... positional arguments: {a,b} subcommand help a a help b b help options: -h, --help show this help message and exit --foo foo help >>> parser.parse_args(['a', '--help']) usage: PROG a [-h] bar positional arguments: bar bar help options: -h, --help show this help message and exit >>> parser.parse_args(['b', '--help']) usage: PROG b [-h] [--baz {X,Y,Z}] options: -h, --help show this help message and exit --baz {X,Y,Z} baz help The "add_subparsers()" method also supports "title" and "description" keyword arguments. When either is present, the subparser’s commands will appear in their own group in the help output. For example: >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers(title='subcommands', ... description='valid subcommands', ... help='additional help') >>> subparsers.add_parser('foo') >>> subparsers.add_parser('bar') >>> parser.parse_args(['-h']) usage: [-h] {foo,bar} ... options: -h, --help show this help message and exit subcommands: valid subcommands {foo,bar} additional help Furthermore, "add_parser()" supports an additional *aliases* argument, which allows multiple strings to refer to the same subparser. This example, like "svn", aliases "co" as a shorthand for "checkout": >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> checkout = subparsers.add_parser('checkout', aliases=['co']) >>> checkout.add_argument('foo') >>> parser.parse_args(['co', 'bar']) Namespace(foo='bar') "add_parser()" supports also an additional *deprecated* argument, which allows to deprecate the subparser. >>> import argparse >>> parser = argparse.ArgumentParser(prog='chicken.py') >>> subparsers = parser.add_subparsers() >>> run = subparsers.add_parser('run') >>> fly = subparsers.add_parser('fly', deprecated=True) >>> parser.parse_args(['fly']) chicken.py: warning: command 'fly' is deprecated Namespace() Added in version 3.13. One particularly effective way of handling subcommands is to combine the use of the "add_subparsers()" method with calls to "set_defaults()" so that each subparser knows which Python function it should execute. For example: >>> # subcommand functions >>> def foo(args): ... print(args.x * args.y) ... >>> def bar(args): ... print('((%s))' % args.z) ... >>> # create the top-level parser >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers(required=True) >>> >>> # create the parser for the "foo" command >>> parser_foo = subparsers.add_parser('foo') >>> parser_foo.add_argument('-x', type=int, default=1) >>> parser_foo.add_argument('y', type=float) >>> parser_foo.set_defaults(func=foo) >>> >>> # create the parser for the "bar" command >>> parser_bar = subparsers.add_parser('bar') >>> parser_bar.add_argument('z') >>> parser_bar.set_defaults(func=bar) >>> >>> # parse the args and call whatever function was selected >>> args = parser.parse_args('foo 1 -x 2'.split()) >>> args.func(args) 2.0 >>> >>> # parse the args and call whatever function was selected >>> args = parser.parse_args('bar XYZYX'.split()) >>> args.func(args) ((XYZYX)) This way, you can let "parse_args()" do the job of calling the appropriate function after argument parsing is complete. Associating functions with actions like this is typically the easiest way to handle the different actions for each of your subparsers. However, if it is necessary to check the name of the subparser that was invoked, the "dest" keyword argument to the "add_subparsers()" call will work: >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers(dest='subparser_name') >>> subparser1 = subparsers.add_parser('1') >>> subparser1.add_argument('-x') >>> subparser2 = subparsers.add_parser('2') >>> subparser2.add_argument('y') >>> parser.parse_args(['2', 'frobble']) Namespace(subparser_name='2', y='frobble') Changed in version 3.7: New *required* keyword-only parameter. FileType objects ---------------- class argparse.FileType(mode='r', bufsize=-1, encoding=None, errors=None) The "FileType" factory creates objects that can be passed to the type argument of "ArgumentParser.add_argument()". Arguments that have "FileType" objects as their type will open command-line arguments as files with the requested modes, buffer sizes, encodings and error handling (see the "open()" function for more details): >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--raw', type=argparse.FileType('wb', 0)) >>> parser.add_argument('out', type=argparse.FileType('w', encoding='UTF-8')) >>> parser.parse_args(['--raw', 'raw.dat', 'file.txt']) Namespace(out=<_io.TextIOWrapper name='file.txt' mode='w' encoding='UTF-8'>, raw=<_io.FileIO name='raw.dat' mode='wb'>) FileType objects understand the pseudo-argument "'-'" and automatically convert this into "sys.stdin" for readable "FileType" objects and "sys.stdout" for writable "FileType" objects: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('infile', type=argparse.FileType('r')) >>> parser.parse_args(['-']) Namespace(infile=<_io.TextIOWrapper name='' encoding='UTF-8'>) Changed in version 3.4: Added the *encodings* and *errors* parameters. Argument groups --------------- ArgumentParser.add_argument_group(title=None, description=None, *[, argument_default][, conflict_handler]) By default, "ArgumentParser" groups command-line arguments into “positional arguments” and “options” when displaying help messages. When there is a better conceptual grouping of arguments than this default one, appropriate groups can be created using the "add_argument_group()" method: >>> parser = argparse.ArgumentParser(prog='PROG', add_help=False) >>> group = parser.add_argument_group('group') >>> group.add_argument('--foo', help='foo help') >>> group.add_argument('bar', help='bar help') >>> parser.print_help() usage: PROG [--foo FOO] bar group: bar bar help --foo FOO foo help The "add_argument_group()" method returns an argument group object which has an "add_argument()" method just like a regular "ArgumentParser". When an argument is added to the group, the parser treats it just like a normal argument, but displays the argument in a separate group for help messages. The "add_argument_group()" method accepts *title* and *description* arguments which can be used to customize this display: >>> parser = argparse.ArgumentParser(prog='PROG', add_help=False) >>> group1 = parser.add_argument_group('group1', 'group1 description') >>> group1.add_argument('foo', help='foo help') >>> group2 = parser.add_argument_group('group2', 'group2 description') >>> group2.add_argument('--bar', help='bar help') >>> parser.print_help() usage: PROG [--bar BAR] foo group1: group1 description foo foo help group2: group2 description --bar BAR bar help The optional, keyword-only parameters argument_default and conflict_handler allow for finer-grained control of the behavior of the argument group. These parameters have the same meaning as in the "ArgumentParser" constructor, but apply specifically to the argument group rather than the entire parser. Note that any arguments not in your user-defined groups will end up back in the usual “positional arguments” and “optional arguments” sections. Changed in version 3.11: Calling "add_argument_group()" on an argument group is deprecated. This feature was never supported and does not always work correctly. The function exists on the API by accident through inheritance and will be removed in the future. Mutual exclusion ---------------- ArgumentParser.add_mutually_exclusive_group(required=False) Create a mutually exclusive group. "argparse" will make sure that only one of the arguments in the mutually exclusive group was present on the command line: >>> parser = argparse.ArgumentParser(prog='PROG') >>> group = parser.add_mutually_exclusive_group() >>> group.add_argument('--foo', action='store_true') >>> group.add_argument('--bar', action='store_false') >>> parser.parse_args(['--foo']) Namespace(bar=True, foo=True) >>> parser.parse_args(['--bar']) Namespace(bar=False, foo=False) >>> parser.parse_args(['--foo', '--bar']) usage: PROG [-h] [--foo | --bar] PROG: error: argument --bar: not allowed with argument --foo The "add_mutually_exclusive_group()" method also accepts a *required* argument, to indicate that at least one of the mutually exclusive arguments is required: >>> parser = argparse.ArgumentParser(prog='PROG') >>> group = parser.add_mutually_exclusive_group(required=True) >>> group.add_argument('--foo', action='store_true') >>> group.add_argument('--bar', action='store_false') >>> parser.parse_args([]) usage: PROG [-h] (--foo | --bar) PROG: error: one of the arguments --foo --bar is required Note that currently mutually exclusive argument groups do not support the *title* and *description* arguments of "add_argument_group()". However, a mutually exclusive group can be added to an argument group that has a title and description. For example: >>> parser = argparse.ArgumentParser(prog='PROG') >>> group = parser.add_argument_group('Group title', 'Group description') >>> exclusive_group = group.add_mutually_exclusive_group(required=True) >>> exclusive_group.add_argument('--foo', help='foo help') >>> exclusive_group.add_argument('--bar', help='bar help') >>> parser.print_help() usage: PROG [-h] (--foo FOO | --bar BAR) options: -h, --help show this help message and exit Group title: Group description --foo FOO foo help --bar BAR bar help Changed in version 3.11: Calling "add_argument_group()" or "add_mutually_exclusive_group()" on a mutually exclusive group is deprecated. These features were never supported and do not always work correctly. The functions exist on the API by accident through inheritance and will be removed in the future. Parser defaults --------------- ArgumentParser.set_defaults(**kwargs) Most of the time, the attributes of the object returned by "parse_args()" will be fully determined by inspecting the command- line arguments and the argument actions. "set_defaults()" allows some additional attributes that are determined without any inspection of the command line to be added: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('foo', type=int) >>> parser.set_defaults(bar=42, baz='badger') >>> parser.parse_args(['736']) Namespace(bar=42, baz='badger', foo=736) Note that parser-level defaults always override argument-level defaults: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default='bar') >>> parser.set_defaults(foo='spam') >>> parser.parse_args([]) Namespace(foo='spam') Parser-level defaults can be particularly useful when working with multiple parsers. See the "add_subparsers()" method for an example of this type. ArgumentParser.get_default(dest) Get the default value for a namespace attribute, as set by either "add_argument()" or by "set_defaults()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default='badger') >>> parser.get_default('foo') 'badger' Printing help ------------- In most typical applications, "parse_args()" will take care of formatting and printing any usage or error messages. However, several formatting methods are available: ArgumentParser.print_usage(file=None) Print a brief description of how the "ArgumentParser" should be invoked on the command line. If *file* is "None", "sys.stdout" is assumed. ArgumentParser.print_help(file=None) Print a help message, including the program usage and information about the arguments registered with the "ArgumentParser". If *file* is "None", "sys.stdout" is assumed. There are also variants of these methods that simply return a string instead of printing it: ArgumentParser.format_usage() Return a string containing a brief description of how the "ArgumentParser" should be invoked on the command line. ArgumentParser.format_help() Return a string containing a help message, including the program usage and information about the arguments registered with the "ArgumentParser". Partial parsing --------------- ArgumentParser.parse_known_args(args=None, namespace=None) Sometimes a script only needs to handle a specific set of command- line arguments, leaving any unrecognized arguments for another script or program. In these cases, the "parse_known_args()" method can be useful. This method works similarly to "parse_args()", but it does not raise an error for extra, unrecognized arguments. Instead, it parses the known arguments and returns a two item tuple that contains the populated namespace and the list of any unrecognized arguments. >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='store_true') >>> parser.add_argument('bar') >>> parser.parse_known_args(['--foo', '--badger', 'BAR', 'spam']) (Namespace(bar='BAR', foo=True), ['--badger', 'spam']) Warning: Prefix matching rules apply to "parse_known_args()". The parser may consume an option even if it’s just a prefix of one of its known options, instead of leaving it in the remaining arguments list. Customizing file parsing ------------------------ ArgumentParser.convert_arg_line_to_args(arg_line) Arguments that are read from a file (see the *fromfile_prefix_chars* keyword argument to the "ArgumentParser" constructor) are read one argument per line. "convert_arg_line_to_args()" can be overridden for fancier reading. This method takes a single argument *arg_line* which is a string read from the argument file. It returns a list of arguments parsed from this string. The method is called once per line read from the argument file, in order. A useful override of this method is one that treats each space- separated word as an argument. The following example demonstrates how to do this: class MyArgumentParser(argparse.ArgumentParser): def convert_arg_line_to_args(self, arg_line): return arg_line.split() Exiting methods --------------- ArgumentParser.exit(status=0, message=None) This method terminates the program, exiting with the specified *status* and, if given, it prints a *message* to "sys.stderr" before that. The user can override this method to handle these steps differently: class ErrorCatchingArgumentParser(argparse.ArgumentParser): def exit(self, status=0, message=None): if status: raise Exception(f'Exiting because of an error: {message}') exit(status) ArgumentParser.error(message) This method prints a usage message, including the *message*, to "sys.stderr" and terminates the program with a status code of 2. Intermixed parsing ------------------ ArgumentParser.parse_intermixed_args(args=None, namespace=None) ArgumentParser.parse_known_intermixed_args(args=None, namespace=None) A number of Unix commands allow the user to intermix optional arguments with positional arguments. The "parse_intermixed_args()" and "parse_known_intermixed_args()" methods support this parsing style. These parsers do not support all the "argparse" features, and will raise exceptions if unsupported features are used. In particular, subparsers, and mutually exclusive groups that include both optionals and positionals are not supported. The following example shows the difference between "parse_known_args()" and "parse_intermixed_args()": the former returns "['2', '3']" as unparsed arguments, while the latter collects all the positionals into "rest". >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> parser.add_argument('cmd') >>> parser.add_argument('rest', nargs='*', type=int) >>> parser.parse_known_args('doit 1 --foo bar 2 3'.split()) (Namespace(cmd='doit', foo='bar', rest=[1]), ['2', '3']) >>> parser.parse_intermixed_args('doit 1 --foo bar 2 3'.split()) Namespace(cmd='doit', foo='bar', rest=[1, 2, 3]) "parse_known_intermixed_args()" returns a two item tuple containing the populated namespace and the list of remaining argument strings. "parse_intermixed_args()" raises an error if there are any remaining unparsed argument strings. Added in version 3.7. Registering custom types or actions ----------------------------------- ArgumentParser.register(registry_name, value, object) Sometimes it’s desirable to use a custom string in error messages to provide more user-friendly output. In these cases, "register()" can be used to register custom actions or types with a parser and allow you to reference the type by their registered name instead of their callable name. The "register()" method accepts three arguments - a *registry_name*, specifying the internal registry where the object will be stored (e.g., "action", "type"), *value*, which is the key under which the object will be registered, and object, the callable to be registered. The following example shows how to register a custom type with a parser: >>> import argparse >>> parser = argparse.ArgumentParser() >>> parser.register('type', 'hexadecimal integer', lambda s: int(s, 16)) >>> parser.add_argument('--foo', type='hexadecimal integer') _StoreAction(option_strings=['--foo'], dest='foo', nargs=None, const=None, default=None, type='hexadecimal integer', choices=None, required=False, help=None, metavar=None, deprecated=False) >>> parser.parse_args(['--foo', '0xFA']) Namespace(foo=250) >>> parser.parse_args(['--foo', '1.2']) usage: PROG [-h] [--foo FOO] PROG: error: argument --foo: invalid 'hexadecimal integer' value: '1.2' Exceptions ========== exception argparse.ArgumentError An error from creating or using an argument (optional or positional). The string value of this exception is the message, augmented with information about the argument that caused it. exception argparse.ArgumentTypeError Raised when something goes wrong converting a command line string to a type. -[ Guides and Tutorials ]- * Argparse Tutorial * Migrating "optparse" code to "argparse" "array" — Efficient arrays of numeric values ******************************************** ====================================================================== This module defines an object type which can compactly represent an array of basic values: characters, integers, floating-point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time by using a *type code*, which is a single character. The following type codes are defined: +-------------+----------------------+---------------------+-------------------------+---------+ | Type code | C Type | Python Type | Minimum size in bytes | Notes | |=============|======================|=====================|=========================|=========| | "'b'" | signed char | int | 1 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'B'" | unsigned char | int | 1 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'u'" | wchar_t | Unicode character | 2 | (1) | +-------------+----------------------+---------------------+-------------------------+---------+ | "'w'" | Py_UCS4 | Unicode character | 4 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'h'" | signed short | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'H'" | unsigned short | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'i'" | signed int | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'I'" | unsigned int | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'l'" | signed long | int | 4 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'L'" | unsigned long | int | 4 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'q'" | signed long long | int | 8 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'Q'" | unsigned long long | int | 8 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'f'" | float | float | 4 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'d'" | double | float | 8 | | +-------------+----------------------+---------------------+-------------------------+---------+ Notes: 1. It can be 16 bits or 32 bits depending on the platform. Changed in version 3.9: "array('u')" now uses "wchar_t" as C type instead of deprecated "Py_UNICODE". This change doesn’t affect its behavior because "Py_UNICODE" is alias of "wchar_t" since Python 3.3. Deprecated since version 3.3, will be removed in version 3.16: Please migrate to "'w'" typecode. The actual representation of values is determined by the machine architecture (strictly speaking, by the C implementation). The actual size can be accessed through the "array.itemsize" attribute. The module defines the following item: array.typecodes A string with all available type codes. The module defines the following type: class array.array(typecode[, initializer]) A new array whose items are restricted by *typecode*, and initialized from the optional *initializer* value, which must be a "bytes" or "bytearray" object, a Unicode string, or iterable over elements of the appropriate type. If given a "bytes" or "bytearray" object, the initializer is passed to the new array’s "frombytes()" method; if given a Unicode string, the initializer is passed to the "fromunicode()" method; otherwise, the initializer’s iterator is passed to the "extend()" method to add initial items to the array. Array objects support the ordinary sequence operations of indexing, slicing, concatenation, and multiplication. When using slice assignment, the assigned value must be an array object with the same type code; in all other cases, "TypeError" is raised. Array objects also implement the buffer interface, and may be used wherever *bytes-like objects* are supported. Raises an auditing event "array.__new__" with arguments "typecode", "initializer". typecode The typecode character used to create the array. itemsize The length in bytes of one array item in the internal representation. append(x) Append a new item with value *x* to the end of the array. buffer_info() Return a tuple "(address, length)" giving the current memory address and the length in elements of the buffer used to hold array’s contents. The size of the memory buffer in bytes can be computed as "array.buffer_info()[1] * array.itemsize". This is occasionally useful when working with low-level (and inherently unsafe) I/O interfaces that require memory addresses, such as certain "ioctl()" operations. The returned numbers are valid as long as the array exists and no length-changing operations are applied to it. Note: When using array objects from code written in C or C++ (the only way to effectively make use of this information), it makes more sense to use the buffer interface supported by array objects. This method is maintained for backward compatibility and should be avoided in new code. The buffer interface is documented in Buffer Protocol. byteswap() “Byteswap” all items of the array. This is only supported for values which are 1, 2, 4, or 8 bytes in size; for other types of values, "RuntimeError" is raised. It is useful when reading data from a file written on a machine with a different byte order. count(x) Return the number of occurrences of *x* in the array. extend(iterable) Append items from *iterable* to the end of the array. If *iterable* is another array, it must have *exactly* the same type code; if not, "TypeError" will be raised. If *iterable* is not an array, it must be iterable and its elements must be the right type to be appended to the array. frombytes(buffer) Appends items from the *bytes-like object*, interpreting its content as an array of machine values (as if it had been read from a file using the "fromfile()" method). Added in version 3.2: "fromstring()" is renamed to "frombytes()" for clarity. fromfile(f, n) Read *n* items (as machine values) from the *file object* *f* and append them to the end of the array. If less than *n* items are available, "EOFError" is raised, but the items that were available are still inserted into the array. fromlist(list) Append items from the list. This is equivalent to "for x in list: a.append(x)" except that if there is a type error, the array is unchanged. fromunicode(s) Extends this array with data from the given Unicode string. The array must have type code "'u'" or "'w'"; otherwise a "ValueError" is raised. Use "array.frombytes(unicodestring.encode(enc))" to append Unicode data to an array of some other type. index(x[, start[, stop]]) Return the smallest *i* such that *i* is the index of the first occurrence of *x* in the array. The optional arguments *start* and *stop* can be specified to search for *x* within a subsection of the array. Raise "ValueError" if *x* is not found. Changed in version 3.10: Added optional *start* and *stop* parameters. insert(i, x) Insert a new item with value *x* in the array before position *i*. Negative values are treated as being relative to the end of the array. pop([i]) Removes the item with the index *i* from the array and returns it. The optional argument defaults to "-1", so that by default the last item is removed and returned. remove(x) Remove the first occurrence of *x* from the array. clear() Remove all elements from the array. Added in version 3.13. reverse() Reverse the order of the items in the array. tobytes() Convert the array to an array of machine values and return the bytes representation (the same sequence of bytes that would be written to a file by the "tofile()" method.) Added in version 3.2: "tostring()" is renamed to "tobytes()" for clarity. tofile(f) Write all items (as machine values) to the *file object* *f*. tolist() Convert the array to an ordinary list with the same items. tounicode() Convert the array to a Unicode string. The array must have a type "'u'" or "'w'"; otherwise a "ValueError" is raised. Use "array.tobytes().decode(enc)" to obtain a Unicode string from an array of some other type. The string representation of array objects has the form "array(typecode, initializer)". The *initializer* is omitted if the array is empty, otherwise it is a Unicode string if the *typecode* is "'u'" or "'w'", otherwise it is a list of numbers. The string representation is guaranteed to be able to be converted back to an array with the same type and value using "eval()", so long as the "array" class has been imported using "from array import array". Variables "inf" and "nan" must also be defined if it contains corresponding floating-point values. Examples: array('l') array('w', 'hello \u2641') array('l', [1, 2, 3, 4, 5]) array('d', [1.0, 2.0, 3.14, -inf, nan]) See also: Module "struct" Packing and unpacking of heterogeneous binary data. NumPy The NumPy package defines another array type. "ast" — Abstract Syntax Trees ***************************** **Source code:** Lib/ast.py ====================================================================== The "ast" module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like. An abstract syntax tree can be generated by passing "ast.PyCF_ONLY_AST" as a flag to the "compile()" built-in function, or using the "parse()" helper provided in this module. The result will be a tree of objects whose classes all inherit from "ast.AST". An abstract syntax tree can be compiled into a Python code object using the built-in "compile()" function. Abstract Grammar ================ The abstract grammar is currently defined as follows: -- ASDL's 4 builtin types are: -- identifier, int, string, constant module Python { mod = Module(stmt* body, type_ignore* type_ignores) | Interactive(stmt* body) | Expression(expr body) | FunctionType(expr* argtypes, expr returns) stmt = FunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list, expr? returns, string? type_comment, type_param* type_params) | AsyncFunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list, expr? returns, string? type_comment, type_param* type_params) | ClassDef(identifier name, expr* bases, keyword* keywords, stmt* body, expr* decorator_list, type_param* type_params) | Return(expr? value) | Delete(expr* targets) | Assign(expr* targets, expr value, string? type_comment) | TypeAlias(expr name, type_param* type_params, expr value) | AugAssign(expr target, operator op, expr value) -- 'simple' indicates that we annotate simple name without parens | AnnAssign(expr target, expr annotation, expr? value, int simple) -- use 'orelse' because else is a keyword in target languages | For(expr target, expr iter, stmt* body, stmt* orelse, string? type_comment) | AsyncFor(expr target, expr iter, stmt* body, stmt* orelse, string? type_comment) | While(expr test, stmt* body, stmt* orelse) | If(expr test, stmt* body, stmt* orelse) | With(withitem* items, stmt* body, string? type_comment) | AsyncWith(withitem* items, stmt* body, string? type_comment) | Match(expr subject, match_case* cases) | Raise(expr? exc, expr? cause) | Try(stmt* body, excepthandler* handlers, stmt* orelse, stmt* finalbody) | TryStar(stmt* body, excepthandler* handlers, stmt* orelse, stmt* finalbody) | Assert(expr test, expr? msg) | Import(alias* names) | ImportFrom(identifier? module, alias* names, int? level) | Global(identifier* names) | Nonlocal(identifier* names) | Expr(expr value) | Pass | Break | Continue -- col_offset is the byte offset in the utf8 string the parser uses attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) -- BoolOp() can use left & right? expr = BoolOp(boolop op, expr* values) | NamedExpr(expr target, expr value) | BinOp(expr left, operator op, expr right) | UnaryOp(unaryop op, expr operand) | Lambda(arguments args, expr body) | IfExp(expr test, expr body, expr orelse) | Dict(expr* keys, expr* values) | Set(expr* elts) | ListComp(expr elt, comprehension* generators) | SetComp(expr elt, comprehension* generators) | DictComp(expr key, expr value, comprehension* generators) | GeneratorExp(expr elt, comprehension* generators) -- the grammar constrains where yield expressions can occur | Await(expr value) | Yield(expr? value) | YieldFrom(expr value) -- need sequences for compare to distinguish between -- x < 4 < 3 and (x < 4) < 3 | Compare(expr left, cmpop* ops, expr* comparators) | Call(expr func, expr* args, keyword* keywords) | FormattedValue(expr value, int conversion, expr? format_spec) | JoinedStr(expr* values) | Constant(constant value, string? kind) -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, expr slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) -- can appear only in Subscript | Slice(expr? lower, expr? upper, expr? step) -- col_offset is the byte offset in the utf8 string the parser uses attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) expr_context = Load | Store | Del boolop = And | Or operator = Add | Sub | Mult | MatMult | Div | Mod | Pow | LShift | RShift | BitOr | BitXor | BitAnd | FloorDiv unaryop = Invert | Not | UAdd | USub cmpop = Eq | NotEq | Lt | LtE | Gt | GtE | Is | IsNot | In | NotIn comprehension = (expr target, expr iter, expr* ifs, int is_async) excepthandler = ExceptHandler(expr? type, identifier? name, stmt* body) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) arguments = (arg* posonlyargs, arg* args, arg? vararg, arg* kwonlyargs, expr* kw_defaults, arg? kwarg, expr* defaults) arg = (identifier arg, expr? annotation, string? type_comment) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) -- keyword arguments supplied to call (NULL identifier for **kwargs) keyword = (identifier? arg, expr value) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) -- import name with optional 'as' alias. alias = (identifier name, identifier? asname) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) withitem = (expr context_expr, expr? optional_vars) match_case = (pattern pattern, expr? guard, stmt* body) pattern = MatchValue(expr value) | MatchSingleton(constant value) | MatchSequence(pattern* patterns) | MatchMapping(expr* keys, pattern* patterns, identifier? rest) | MatchClass(expr cls, pattern* patterns, identifier* kwd_attrs, pattern* kwd_patterns) | MatchStar(identifier? name) -- The optional "rest" MatchMapping parameter handles capturing extra mapping keys | MatchAs(pattern? pattern, identifier? name) | MatchOr(pattern* patterns) attributes (int lineno, int col_offset, int end_lineno, int end_col_offset) type_ignore = TypeIgnore(int lineno, string tag) type_param = TypeVar(identifier name, expr? bound, expr? default_value) | ParamSpec(identifier name, expr? default_value) | TypeVarTuple(identifier name, expr? default_value) attributes (int lineno, int col_offset, int end_lineno, int end_col_offset) } Node classes ============ class ast.AST This is the base of all AST node classes. The actual node classes are derived from the "Parser/Python.asdl" file, which is reproduced above. They are defined in the "_ast" C module and re-exported in "ast". There is one class defined for each left-hand side symbol in the abstract grammar (for example, "ast.stmt" or "ast.expr"). In addition, there is one class defined for each constructor on the right-hand side; these classes inherit from the classes for the left-hand side trees. For example, "ast.BinOp" inherits from "ast.expr". For production rules with alternatives (aka “sums”), the left-hand side class is abstract: only instances of specific constructor nodes are ever created. _fields Each concrete class has an attribute "_fields" which gives the names of all child nodes. Each instance of a concrete class has one attribute for each child node, of the type as defined in the grammar. For example, "ast.BinOp" instances have an attribute "left" of type "ast.expr". If these attributes are marked as optional in the grammar (using a question mark), the value might be "None". If the attributes can have zero-or-more values (marked with an asterisk), the values are represented as Python lists. All possible attributes must be present and have valid values when compiling an AST with "compile()". _field_types The "_field_types" attribute on each concrete class is a dictionary mapping field names (as also listed in "_fields") to their types. >>> ast.TypeVar._field_types {'name': , 'bound': ast.expr | None, 'default_value': ast.expr | None} Added in version 3.13. lineno col_offset end_lineno end_col_offset Instances of "ast.expr" and "ast.stmt" subclasses have "lineno", "col_offset", "end_lineno", and "end_col_offset" attributes. The "lineno" and "end_lineno" are the first and last line numbers of source text span (1-indexed so the first line is line 1) and the "col_offset" and "end_col_offset" are the corresponding UTF-8 byte offsets of the first and last tokens that generated the node. The UTF-8 offset is recorded because the parser uses UTF-8 internally. Note that the end positions are not required by the compiler and are therefore optional. The end offset is *after* the last symbol, for example one can get the source segment of a one-line expression node using "source_line[node.col_offset : node.end_col_offset]". The constructor of a class "ast.T" parses its arguments as follows: * If there are positional arguments, there must be as many as there are items in "T._fields"; they will be assigned as attributes of these names. * If there are keyword arguments, they will set the attributes of the same names to the given values. For example, to create and populate an "ast.UnaryOp" node, you could use node = ast.UnaryOp(ast.USub(), ast.Constant(5, lineno=0, col_offset=0), lineno=0, col_offset=0) If a field that is optional in the grammar is omitted from the constructor, it defaults to "None". If a list field is omitted, it defaults to the empty list. If a field of type "ast.expr_context" is omitted, it defaults to "Load()". If any other field is omitted, a "DeprecationWarning" is raised and the AST node will not have this field. In Python 3.15, this condition will raise an error. Changed in version 3.8: Class "ast.Constant" is now used for all constants. Changed in version 3.9: Simple indices are represented by their value, extended slices are represented as tuples. Deprecated since version 3.8: Old classes "ast.Num", "ast.Str", "ast.Bytes", "ast.NameConstant" and "ast.Ellipsis" are still available, but they will be removed in future Python releases. In the meantime, instantiating them will return an instance of a different class. Deprecated since version 3.9: Old classes "ast.Index" and "ast.ExtSlice" are still available, but they will be removed in future Python releases. In the meantime, instantiating them will return an instance of a different class. Deprecated since version 3.13, will be removed in version 3.15: Previous versions of Python allowed the creation of AST nodes that were missing required fields. Similarly, AST node constructors allowed arbitrary keyword arguments that were set as attributes of the AST node, even if they did not match any of the fields of the AST node. This behavior is deprecated and will be removed in Python 3.15. Note: The descriptions of the specific node classes displayed here were initially adapted from the fantastic Green Tree Snakes project and all its contributors. Root nodes ---------- class ast.Module(body, type_ignores) A Python module, as with file input. Node type generated by "ast.parse()" in the default ""exec"" *mode*. "body" is a "list" of the module’s Statements. "type_ignores" is a "list" of the module’s type ignore comments; see "ast.parse()" for more details. >>> print(ast.dump(ast.parse('x = 1'), indent=4)) Module( body=[ Assign( targets=[ Name(id='x', ctx=Store())], value=Constant(value=1))]) class ast.Expression(body) A single Python expression input. Node type generated by "ast.parse()" when *mode* is ""eval"". "body" is a single node, one of the expression types. >>> print(ast.dump(ast.parse('123', mode='eval'), indent=4)) Expression( body=Constant(value=123)) class ast.Interactive(body) A single interactive input, like in Interactive Mode. Node type generated by "ast.parse()" when *mode* is ""single"". "body" is a "list" of statement nodes. >>> print(ast.dump(ast.parse('x = 1; y = 2', mode='single'), indent=4)) Interactive( body=[ Assign( targets=[ Name(id='x', ctx=Store())], value=Constant(value=1)), Assign( targets=[ Name(id='y', ctx=Store())], value=Constant(value=2))]) class ast.FunctionType(argtypes, returns) A representation of an old-style type comments for functions, as Python versions prior to 3.5 didn’t support **PEP 484** annotations. Node type generated by "ast.parse()" when *mode* is ""func_type"". Such type comments would look like this: def sum_two_number(a, b): # type: (int, int) -> int return a + b "argtypes" is a "list" of expression nodes. "returns" is a single expression node. >>> print(ast.dump(ast.parse('(int, str) -> List[int]', mode='func_type'), indent=4)) FunctionType( argtypes=[ Name(id='int', ctx=Load()), Name(id='str', ctx=Load())], returns=Subscript( value=Name(id='List', ctx=Load()), slice=Name(id='int', ctx=Load()), ctx=Load())) Added in version 3.8. Literals -------- class ast.Constant(value) A constant value. The "value" attribute of the "Constant" literal contains the Python object it represents. The values represented can be instances of "str", "bytes", "int", "float", "complex", and "bool", and the constants "None" and "Ellipsis". >>> print(ast.dump(ast.parse('123', mode='eval'), indent=4)) Expression( body=Constant(value=123)) class ast.FormattedValue(value, conversion, format_spec) Node representing a single formatting field in an f-string. If the string contains a single formatting field and nothing else the node can be isolated otherwise it appears in "JoinedStr". * "value" is any expression node (such as a literal, a variable, or a function call). * "conversion" is an integer: * -1: no formatting * 115: "!s" string formatting * 114: "!r" repr formatting * 97: "!a" ascii formatting * "format_spec" is a "JoinedStr" node representing the formatting of the value, or "None" if no format was specified. Both "conversion" and "format_spec" can be set at the same time. class ast.JoinedStr(values) An f-string, comprising a series of "FormattedValue" and "Constant" nodes. >>> print(ast.dump(ast.parse('f"sin({a}) is {sin(a):.3}"', mode='eval'), indent=4)) Expression( body=JoinedStr( values=[ Constant(value='sin('), FormattedValue( value=Name(id='a', ctx=Load()), conversion=-1), Constant(value=') is '), FormattedValue( value=Call( func=Name(id='sin', ctx=Load()), args=[ Name(id='a', ctx=Load())]), conversion=-1, format_spec=JoinedStr( values=[ Constant(value='.3')]))])) class ast.List(elts, ctx) class ast.Tuple(elts, ctx) A list or tuple. "elts" holds a list of nodes representing the elements. "ctx" is "Store" if the container is an assignment target (i.e. "(x,y)=something"), and "Load" otherwise. >>> print(ast.dump(ast.parse('[1, 2, 3]', mode='eval'), indent=4)) Expression( body=List( elts=[ Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load())) >>> print(ast.dump(ast.parse('(1, 2, 3)', mode='eval'), indent=4)) Expression( body=Tuple( elts=[ Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load())) class ast.Set(elts) A set. "elts" holds a list of nodes representing the set’s elements. >>> print(ast.dump(ast.parse('{1, 2, 3}', mode='eval'), indent=4)) Expression( body=Set( elts=[ Constant(value=1), Constant(value=2), Constant(value=3)])) class ast.Dict(keys, values) A dictionary. "keys" and "values" hold lists of nodes representing the keys and the values respectively, in matching order (what would be returned when calling "dictionary.keys()" and "dictionary.values()"). When doing dictionary unpacking using dictionary literals the expression to be expanded goes in the "values" list, with a "None" at the corresponding position in "keys". >>> print(ast.dump(ast.parse('{"a":1, **d}', mode='eval'), indent=4)) Expression( body=Dict( keys=[ Constant(value='a'), None], values=[ Constant(value=1), Name(id='d', ctx=Load())])) Variables --------- class ast.Name(id, ctx) A variable name. "id" holds the name as a string, and "ctx" is one of the following types. class ast.Load class ast.Store class ast.Del Variable references can be used to load the value of a variable, to assign a new value to it, or to delete it. Variable references are given a context to distinguish these cases. >>> print(ast.dump(ast.parse('a'), indent=4)) Module( body=[ Expr( value=Name(id='a', ctx=Load()))]) >>> print(ast.dump(ast.parse('a = 1'), indent=4)) Module( body=[ Assign( targets=[ Name(id='a', ctx=Store())], value=Constant(value=1))]) >>> print(ast.dump(ast.parse('del a'), indent=4)) Module( body=[ Delete( targets=[ Name(id='a', ctx=Del())])]) class ast.Starred(value, ctx) A "*var" variable reference. "value" holds the variable, typically a "Name" node. This type must be used when building a "Call" node with "*args". >>> print(ast.dump(ast.parse('a, *b = it'), indent=4)) Module( body=[ Assign( targets=[ Tuple( elts=[ Name(id='a', ctx=Store()), Starred( value=Name(id='b', ctx=Store()), ctx=Store())], ctx=Store())], value=Name(id='it', ctx=Load()))]) Expressions ----------- class ast.Expr(value) When an expression, such as a function call, appears as a statement by itself with its return value not used or stored, it is wrapped in this container. "value" holds one of the other nodes in this section, a "Constant", a "Name", a "Lambda", a "Yield" or "YieldFrom" node. >>> print(ast.dump(ast.parse('-a'), indent=4)) Module( body=[ Expr( value=UnaryOp( op=USub(), operand=Name(id='a', ctx=Load())))]) class ast.UnaryOp(op, operand) A unary operation. "op" is the operator, and "operand" any expression node. class ast.UAdd class ast.USub class ast.Not class ast.Invert Unary operator tokens. "Not" is the "not" keyword, "Invert" is the "~" operator. >>> print(ast.dump(ast.parse('not x', mode='eval'), indent=4)) Expression( body=UnaryOp( op=Not(), operand=Name(id='x', ctx=Load()))) class ast.BinOp(left, op, right) A binary operation (like addition or division). "op" is the operator, and "left" and "right" are any expression nodes. >>> print(ast.dump(ast.parse('x + y', mode='eval'), indent=4)) Expression( body=BinOp( left=Name(id='x', ctx=Load()), op=Add(), right=Name(id='y', ctx=Load()))) class ast.Add class ast.Sub class ast.Mult class ast.Div class ast.FloorDiv class ast.Mod class ast.Pow class ast.LShift class ast.RShift class ast.BitOr class ast.BitXor class ast.BitAnd class ast.MatMult Binary operator tokens. class ast.BoolOp(op, values) A boolean operation, ‘or’ or ‘and’. "op" is "Or" or "And". "values" are the values involved. Consecutive operations with the same operator, such as "a or b or c", are collapsed into one node with several values. This doesn’t include "not", which is a "UnaryOp". >>> print(ast.dump(ast.parse('x or y', mode='eval'), indent=4)) Expression( body=BoolOp( op=Or(), values=[ Name(id='x', ctx=Load()), Name(id='y', ctx=Load())])) class ast.And class ast.Or Boolean operator tokens. class ast.Compare(left, ops, comparators) A comparison of two or more values. "left" is the first value in the comparison, "ops" the list of operators, and "comparators" the list of values after the first element in the comparison. >>> print(ast.dump(ast.parse('1 <= a < 10', mode='eval'), indent=4)) Expression( body=Compare( left=Constant(value=1), ops=[ LtE(), Lt()], comparators=[ Name(id='a', ctx=Load()), Constant(value=10)])) class ast.Eq class ast.NotEq class ast.Lt class ast.LtE class ast.Gt class ast.GtE class ast.Is class ast.IsNot class ast.In class ast.NotIn Comparison operator tokens. class ast.Call(func, args, keywords) A function call. "func" is the function, which will often be a "Name" or "Attribute" object. Of the arguments: * "args" holds a list of the arguments passed by position. * "keywords" holds a list of "keyword" objects representing arguments passed by keyword. The "args" and "keywords" arguments are optional and default to empty lists. >>> print(ast.dump(ast.parse('func(a, b=c, *d, **e)', mode='eval'), indent=4)) Expression( body=Call( func=Name(id='func', ctx=Load()), args=[ Name(id='a', ctx=Load()), Starred( value=Name(id='d', ctx=Load()), ctx=Load())], keywords=[ keyword( arg='b', value=Name(id='c', ctx=Load())), keyword( value=Name(id='e', ctx=Load()))])) class ast.keyword(arg, value) A keyword argument to a function call or class definition. "arg" is a raw string of the parameter name, "value" is a node to pass in. class ast.IfExp(test, body, orelse) An expression such as "a if b else c". Each field holds a single node, so in the following example, all three are "Name" nodes. >>> print(ast.dump(ast.parse('a if b else c', mode='eval'), indent=4)) Expression( body=IfExp( test=Name(id='b', ctx=Load()), body=Name(id='a', ctx=Load()), orelse=Name(id='c', ctx=Load()))) class ast.Attribute(value, attr, ctx) Attribute access, e.g. "d.keys". "value" is a node, typically a "Name". "attr" is a bare string giving the name of the attribute, and "ctx" is "Load", "Store" or "Del" according to how the attribute is acted on. >>> print(ast.dump(ast.parse('snake.colour', mode='eval'), indent=4)) Expression( body=Attribute( value=Name(id='snake', ctx=Load()), attr='colour', ctx=Load())) class ast.NamedExpr(target, value) A named expression. This AST node is produced by the assignment expressions operator (also known as the walrus operator). As opposed to the "Assign" node in which the first argument can be multiple nodes, in this case both "target" and "value" must be single nodes. >>> print(ast.dump(ast.parse('(x := 4)', mode='eval'), indent=4)) Expression( body=NamedExpr( target=Name(id='x', ctx=Store()), value=Constant(value=4))) Added in version 3.8. Subscripting ~~~~~~~~~~~~ class ast.Subscript(value, slice, ctx) A subscript, such as "l[1]". "value" is the subscripted object (usually sequence or mapping). "slice" is an index, slice or key. It can be a "Tuple" and contain a "Slice". "ctx" is "Load", "Store" or "Del" according to the action performed with the subscript. >>> print(ast.dump(ast.parse('l[1:2, 3]', mode='eval'), indent=4)) Expression( body=Subscript( value=Name(id='l', ctx=Load()), slice=Tuple( elts=[ Slice( lower=Constant(value=1), upper=Constant(value=2)), Constant(value=3)], ctx=Load()), ctx=Load())) class ast.Slice(lower, upper, step) Regular slicing (on the form "lower:upper" or "lower:upper:step"). Can occur only inside the *slice* field of "Subscript", either directly or as an element of "Tuple". >>> print(ast.dump(ast.parse('l[1:2]', mode='eval'), indent=4)) Expression( body=Subscript( value=Name(id='l', ctx=Load()), slice=Slice( lower=Constant(value=1), upper=Constant(value=2)), ctx=Load())) Comprehensions ~~~~~~~~~~~~~~ class ast.ListComp(elt, generators) class ast.SetComp(elt, generators) class ast.GeneratorExp(elt, generators) class ast.DictComp(key, value, generators) List and set comprehensions, generator expressions, and dictionary comprehensions. "elt" (or "key" and "value") is a single node representing the part that will be evaluated for each item. "generators" is a list of "comprehension" nodes. >>> print(ast.dump( ... ast.parse('[x for x in numbers]', mode='eval'), ... indent=4, ... )) Expression( body=ListComp( elt=Name(id='x', ctx=Load()), generators=[ comprehension( target=Name(id='x', ctx=Store()), iter=Name(id='numbers', ctx=Load()), is_async=0)])) >>> print(ast.dump( ... ast.parse('{x: x**2 for x in numbers}', mode='eval'), ... indent=4, ... )) Expression( body=DictComp( key=Name(id='x', ctx=Load()), value=BinOp( left=Name(id='x', ctx=Load()), op=Pow(), right=Constant(value=2)), generators=[ comprehension( target=Name(id='x', ctx=Store()), iter=Name(id='numbers', ctx=Load()), is_async=0)])) >>> print(ast.dump( ... ast.parse('{x for x in numbers}', mode='eval'), ... indent=4, ... )) Expression( body=SetComp( elt=Name(id='x', ctx=Load()), generators=[ comprehension( target=Name(id='x', ctx=Store()), iter=Name(id='numbers', ctx=Load()), is_async=0)])) class ast.comprehension(target, iter, ifs, is_async) One "for" clause in a comprehension. "target" is the reference to use for each element - typically a "Name" or "Tuple" node. "iter" is the object to iterate over. "ifs" is a list of test expressions: each "for" clause can have multiple "ifs". "is_async" indicates a comprehension is asynchronous (using an "async for" instead of "for"). The value is an integer (0 or 1). >>> print(ast.dump(ast.parse('[ord(c) for line in file for c in line]', mode='eval'), ... indent=4)) # Multiple comprehensions in one. Expression( body=ListComp( elt=Call( func=Name(id='ord', ctx=Load()), args=[ Name(id='c', ctx=Load())]), generators=[ comprehension( target=Name(id='line', ctx=Store()), iter=Name(id='file', ctx=Load()), is_async=0), comprehension( target=Name(id='c', ctx=Store()), iter=Name(id='line', ctx=Load()), is_async=0)])) >>> print(ast.dump(ast.parse('(n**2 for n in it if n>5 if n<10)', mode='eval'), ... indent=4)) # generator comprehension Expression( body=GeneratorExp( elt=BinOp( left=Name(id='n', ctx=Load()), op=Pow(), right=Constant(value=2)), generators=[ comprehension( target=Name(id='n', ctx=Store()), iter=Name(id='it', ctx=Load()), ifs=[ Compare( left=Name(id='n', ctx=Load()), ops=[ Gt()], comparators=[ Constant(value=5)]), Compare( left=Name(id='n', ctx=Load()), ops=[ Lt()], comparators=[ Constant(value=10)])], is_async=0)])) >>> print(ast.dump(ast.parse('[i async for i in soc]', mode='eval'), ... indent=4)) # Async comprehension Expression( body=ListComp( elt=Name(id='i', ctx=Load()), generators=[ comprehension( target=Name(id='i', ctx=Store()), iter=Name(id='soc', ctx=Load()), is_async=1)])) Statements ---------- class ast.Assign(targets, value, type_comment) An assignment. "targets" is a list of nodes, and "value" is a single node. Multiple nodes in "targets" represents assigning the same value to each. Unpacking is represented by putting a "Tuple" or "List" within "targets". type_comment "type_comment" is an optional string with the type annotation as a comment. >>> print(ast.dump(ast.parse('a = b = 1'), indent=4)) # Multiple assignment Module( body=[ Assign( targets=[ Name(id='a', ctx=Store()), Name(id='b', ctx=Store())], value=Constant(value=1))]) >>> print(ast.dump(ast.parse('a,b = c'), indent=4)) # Unpacking Module( body=[ Assign( targets=[ Tuple( elts=[ Name(id='a', ctx=Store()), Name(id='b', ctx=Store())], ctx=Store())], value=Name(id='c', ctx=Load()))]) class ast.AnnAssign(target, annotation, value, simple) An assignment with a type annotation. "target" is a single node and can be a "Name", an "Attribute" or a "Subscript". "annotation" is the annotation, such as a "Constant" or "Name" node. "value" is a single optional node. "simple" is always either 0 (indicating a “complex” target) or 1 (indicating a “simple” target). A “simple” target consists solely of a "Name" node that does not appear between parentheses; all other targets are considered complex. Only simple targets appear in the "__annotations__" dictionary of modules and classes. >>> print(ast.dump(ast.parse('c: int'), indent=4)) Module( body=[ AnnAssign( target=Name(id='c', ctx=Store()), annotation=Name(id='int', ctx=Load()), simple=1)]) >>> print(ast.dump(ast.parse('(a): int = 1'), indent=4)) # Annotation with parenthesis Module( body=[ AnnAssign( target=Name(id='a', ctx=Store()), annotation=Name(id='int', ctx=Load()), value=Constant(value=1), simple=0)]) >>> print(ast.dump(ast.parse('a.b: int'), indent=4)) # Attribute annotation Module( body=[ AnnAssign( target=Attribute( value=Name(id='a', ctx=Load()), attr='b', ctx=Store()), annotation=Name(id='int', ctx=Load()), simple=0)]) >>> print(ast.dump(ast.parse('a[1]: int'), indent=4)) # Subscript annotation Module( body=[ AnnAssign( target=Subscript( value=Name(id='a', ctx=Load()), slice=Constant(value=1), ctx=Store()), annotation=Name(id='int', ctx=Load()), simple=0)]) class ast.AugAssign(target, op, value) Augmented assignment, such as "a += 1". In the following example, "target" is a "Name" node for "x" (with the "Store" context), "op" is "Add", and "value" is a "Constant" with value for 1. The "target" attribute cannot be of class "Tuple" or "List", unlike the targets of "Assign". >>> print(ast.dump(ast.parse('x += 2'), indent=4)) Module( body=[ AugAssign( target=Name(id='x', ctx=Store()), op=Add(), value=Constant(value=2))]) class ast.Raise(exc, cause) A "raise" statement. "exc" is the exception object to be raised, normally a "Call" or "Name", or "None" for a standalone "raise". "cause" is the optional part for "y" in "raise x from y". >>> print(ast.dump(ast.parse('raise x from y'), indent=4)) Module( body=[ Raise( exc=Name(id='x', ctx=Load()), cause=Name(id='y', ctx=Load()))]) class ast.Assert(test, msg) An assertion. "test" holds the condition, such as a "Compare" node. "msg" holds the failure message. >>> print(ast.dump(ast.parse('assert x,y'), indent=4)) Module( body=[ Assert( test=Name(id='x', ctx=Load()), msg=Name(id='y', ctx=Load()))]) class ast.Delete(targets) Represents a "del" statement. "targets" is a list of nodes, such as "Name", "Attribute" or "Subscript" nodes. >>> print(ast.dump(ast.parse('del x,y,z'), indent=4)) Module( body=[ Delete( targets=[ Name(id='x', ctx=Del()), Name(id='y', ctx=Del()), Name(id='z', ctx=Del())])]) class ast.Pass A "pass" statement. >>> print(ast.dump(ast.parse('pass'), indent=4)) Module( body=[ Pass()]) class ast.TypeAlias(name, type_params, value) A type alias created through the "type" statement. "name" is the name of the alias, "type_params" is a list of type parameters, and "value" is the value of the type alias. >>> print(ast.dump(ast.parse('type Alias = int'), indent=4)) Module( body=[ TypeAlias( name=Name(id='Alias', ctx=Store()), value=Name(id='int', ctx=Load()))]) Added in version 3.12. Other statements which are only applicable inside functions or loops are described in other sections. Imports ~~~~~~~ class ast.Import(names) An import statement. "names" is a list of "alias" nodes. >>> print(ast.dump(ast.parse('import x,y,z'), indent=4)) Module( body=[ Import( names=[ alias(name='x'), alias(name='y'), alias(name='z')])]) class ast.ImportFrom(module, names, level) Represents "from x import y". "module" is a raw string of the ‘from’ name, without any leading dots, or "None" for statements such as "from . import foo". "level" is an integer holding the level of the relative import (0 means absolute import). >>> print(ast.dump(ast.parse('from y import x,y,z'), indent=4)) Module( body=[ ImportFrom( module='y', names=[ alias(name='x'), alias(name='y'), alias(name='z')], level=0)]) class ast.alias(name, asname) Both parameters are raw strings of the names. "asname" can be "None" if the regular name is to be used. >>> print(ast.dump(ast.parse('from ..foo.bar import a as b, c'), indent=4)) Module( body=[ ImportFrom( module='foo.bar', names=[ alias(name='a', asname='b'), alias(name='c')], level=2)]) Control flow ------------ Note: Optional clauses such as "else" are stored as an empty list if they’re not present. class ast.If(test, body, orelse) An "if" statement. "test" holds a single node, such as a "Compare" node. "body" and "orelse" each hold a list of nodes. "elif" clauses don’t have a special representation in the AST, but rather appear as extra "If" nodes within the "orelse" section of the previous one. >>> print(ast.dump(ast.parse(""" ... if x: ... ... ... elif y: ... ... ... else: ... ... ... """), indent=4)) Module( body=[ If( test=Name(id='x', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ If( test=Name(id='y', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ Expr( value=Constant(value=Ellipsis))])])]) class ast.For(target, iter, body, orelse, type_comment) A "for" loop. "target" holds the variable(s) the loop assigns to, as a single "Name", "Tuple", "List", "Attribute" or "Subscript" node. "iter" holds the item to be looped over, again as a single node. "body" and "orelse" contain lists of nodes to execute. Those in "orelse" are executed if the loop finishes normally, rather than via a "break" statement. type_comment "type_comment" is an optional string with the type annotation as a comment. >>> print(ast.dump(ast.parse(""" ... for x in y: ... ... ... else: ... ... ... """), indent=4)) Module( body=[ For( target=Name(id='x', ctx=Store()), iter=Name(id='y', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ Expr( value=Constant(value=Ellipsis))])]) class ast.While(test, body, orelse) A "while" loop. "test" holds the condition, such as a "Compare" node. >>> print(ast.dump(ast.parse(""" ... while x: ... ... ... else: ... ... ... """), indent=4)) Module( body=[ While( test=Name(id='x', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ Expr( value=Constant(value=Ellipsis))])]) class ast.Break class ast.Continue The "break" and "continue" statements. >>> print(ast.dump(ast.parse("""\ ... for a in b: ... if a > 5: ... break ... else: ... continue ... ... """), indent=4)) Module( body=[ For( target=Name(id='a', ctx=Store()), iter=Name(id='b', ctx=Load()), body=[ If( test=Compare( left=Name(id='a', ctx=Load()), ops=[ Gt()], comparators=[ Constant(value=5)]), body=[ Break()], orelse=[ Continue()])])]) class ast.Try(body, handlers, orelse, finalbody) "try" blocks. All attributes are list of nodes to execute, except for "handlers", which is a list of "ExceptHandler" nodes. >>> print(ast.dump(ast.parse(""" ... try: ... ... ... except Exception: ... ... ... except OtherException as e: ... ... ... else: ... ... ... finally: ... ... ... """), indent=4)) Module( body=[ Try( body=[ Expr( value=Constant(value=Ellipsis))], handlers=[ ExceptHandler( type=Name(id='Exception', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))]), ExceptHandler( type=Name(id='OtherException', ctx=Load()), name='e', body=[ Expr( value=Constant(value=Ellipsis))])], orelse=[ Expr( value=Constant(value=Ellipsis))], finalbody=[ Expr( value=Constant(value=Ellipsis))])]) class ast.TryStar(body, handlers, orelse, finalbody) "try" blocks which are followed by "except*" clauses. The attributes are the same as for "Try" but the "ExceptHandler" nodes in "handlers" are interpreted as "except*" blocks rather then "except". >>> print(ast.dump(ast.parse(""" ... try: ... ... ... except* Exception: ... ... ... """), indent=4)) Module( body=[ TryStar( body=[ Expr( value=Constant(value=Ellipsis))], handlers=[ ExceptHandler( type=Name(id='Exception', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.11. class ast.ExceptHandler(type, name, body) A single "except" clause. "type" is the exception type it will match, typically a "Name" node (or "None" for a catch-all "except:" clause). "name" is a raw string for the name to hold the exception, or "None" if the clause doesn’t have "as foo". "body" is a list of nodes. >>> print(ast.dump(ast.parse("""\ ... try: ... a + 1 ... except TypeError: ... pass ... """), indent=4)) Module( body=[ Try( body=[ Expr( value=BinOp( left=Name(id='a', ctx=Load()), op=Add(), right=Constant(value=1)))], handlers=[ ExceptHandler( type=Name(id='TypeError', ctx=Load()), body=[ Pass()])])]) class ast.With(items, body, type_comment) A "with" block. "items" is a list of "withitem" nodes representing the context managers, and "body" is the indented block inside the context. type_comment "type_comment" is an optional string with the type annotation as a comment. class ast.withitem(context_expr, optional_vars) A single context manager in a "with" block. "context_expr" is the context manager, often a "Call" node. "optional_vars" is a "Name", "Tuple" or "List" for the "as foo" part, or "None" if that isn’t used. >>> print(ast.dump(ast.parse("""\ ... with a as b, c as d: ... something(b, d) ... """), indent=4)) Module( body=[ With( items=[ withitem( context_expr=Name(id='a', ctx=Load()), optional_vars=Name(id='b', ctx=Store())), withitem( context_expr=Name(id='c', ctx=Load()), optional_vars=Name(id='d', ctx=Store()))], body=[ Expr( value=Call( func=Name(id='something', ctx=Load()), args=[ Name(id='b', ctx=Load()), Name(id='d', ctx=Load())]))])]) Pattern matching ---------------- class ast.Match(subject, cases) A "match" statement. "subject" holds the subject of the match (the object that is being matched against the cases) and "cases" contains an iterable of "match_case" nodes with the different cases. Added in version 3.10. class ast.match_case(pattern, guard, body) A single case pattern in a "match" statement. "pattern" contains the match pattern that the subject will be matched against. Note that the "AST" nodes produced for patterns differ from those produced for expressions, even when they share the same syntax. The "guard" attribute contains an expression that will be evaluated if the pattern matches the subject. "body" contains a list of nodes to execute if the pattern matches and the result of evaluating the guard expression is true. >>> print(ast.dump(ast.parse(""" ... match x: ... case [x] if x>0: ... ... ... case tuple(): ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchSequence( patterns=[ MatchAs(name='x')]), guard=Compare( left=Name(id='x', ctx=Load()), ops=[ Gt()], comparators=[ Constant(value=0)]), body=[ Expr( value=Constant(value=Ellipsis))]), match_case( pattern=MatchClass( cls=Name(id='tuple', ctx=Load())), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchValue(value) A match literal or value pattern that compares by equality. "value" is an expression node. Permitted value nodes are restricted as described in the match statement documentation. This pattern succeeds if the match subject is equal to the evaluated value. >>> print(ast.dump(ast.parse(""" ... match x: ... case "Relevant": ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchValue( value=Constant(value='Relevant')), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchSingleton(value) A match literal pattern that compares by identity. "value" is the singleton to be compared against: "None", "True", or "False". This pattern succeeds if the match subject is the given constant. >>> print(ast.dump(ast.parse(""" ... match x: ... case None: ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchSingleton(value=None), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchSequence(patterns) A match sequence pattern. "patterns" contains the patterns to be matched against the subject elements if the subject is a sequence. Matches a variable length sequence if one of the subpatterns is a "MatchStar" node, otherwise matches a fixed length sequence. >>> print(ast.dump(ast.parse(""" ... match x: ... case [1, 2]: ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchSequence( patterns=[ MatchValue( value=Constant(value=1)), MatchValue( value=Constant(value=2))]), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchStar(name) Matches the rest of the sequence in a variable length match sequence pattern. If "name" is not "None", a list containing the remaining sequence elements is bound to that name if the overall sequence pattern is successful. >>> print(ast.dump(ast.parse(""" ... match x: ... case [1, 2, *rest]: ... ... ... case [*_]: ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchSequence( patterns=[ MatchValue( value=Constant(value=1)), MatchValue( value=Constant(value=2)), MatchStar(name='rest')]), body=[ Expr( value=Constant(value=Ellipsis))]), match_case( pattern=MatchSequence( patterns=[ MatchStar()]), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchMapping(keys, patterns, rest) A match mapping pattern. "keys" is a sequence of expression nodes. "patterns" is a corresponding sequence of pattern nodes. "rest" is an optional name that can be specified to capture the remaining mapping elements. Permitted key expressions are restricted as described in the match statement documentation. This pattern succeeds if the subject is a mapping, all evaluated key expressions are present in the mapping, and the value corresponding to each key matches the corresponding subpattern. If "rest" is not "None", a dict containing the remaining mapping elements is bound to that name if the overall mapping pattern is successful. >>> print(ast.dump(ast.parse(""" ... match x: ... case {1: _, 2: _}: ... ... ... case {**rest}: ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchMapping( keys=[ Constant(value=1), Constant(value=2)], patterns=[ MatchAs(), MatchAs()]), body=[ Expr( value=Constant(value=Ellipsis))]), match_case( pattern=MatchMapping(rest='rest'), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchClass(cls, patterns, kwd_attrs, kwd_patterns) A match class pattern. "cls" is an expression giving the nominal class to be matched. "patterns" is a sequence of pattern nodes to be matched against the class defined sequence of pattern matching attributes. "kwd_attrs" is a sequence of additional attributes to be matched (specified as keyword arguments in the class pattern), "kwd_patterns" are the corresponding patterns (specified as keyword values in the class pattern). This pattern succeeds if the subject is an instance of the nominated class, all positional patterns match the corresponding class-defined attributes, and any specified keyword attributes match their corresponding pattern. Note: classes may define a property that returns self in order to match a pattern node against the instance being matched. Several builtin types are also matched that way, as described in the match statement documentation. >>> print(ast.dump(ast.parse(""" ... match x: ... case Point2D(0, 0): ... ... ... case Point3D(x=0, y=0, z=0): ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchClass( cls=Name(id='Point2D', ctx=Load()), patterns=[ MatchValue( value=Constant(value=0)), MatchValue( value=Constant(value=0))]), body=[ Expr( value=Constant(value=Ellipsis))]), match_case( pattern=MatchClass( cls=Name(id='Point3D', ctx=Load()), kwd_attrs=[ 'x', 'y', 'z'], kwd_patterns=[ MatchValue( value=Constant(value=0)), MatchValue( value=Constant(value=0)), MatchValue( value=Constant(value=0))]), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchAs(pattern, name) A match “as-pattern”, capture pattern or wildcard pattern. "pattern" contains the match pattern that the subject will be matched against. If the pattern is "None", the node represents a capture pattern (i.e a bare name) and will always succeed. The "name" attribute contains the name that will be bound if the pattern is successful. If "name" is "None", "pattern" must also be "None" and the node represents the wildcard pattern. >>> print(ast.dump(ast.parse(""" ... match x: ... case [x] as y: ... ... ... case _: ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchAs( pattern=MatchSequence( patterns=[ MatchAs(name='x')]), name='y'), body=[ Expr( value=Constant(value=Ellipsis))]), match_case( pattern=MatchAs(), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. class ast.MatchOr(patterns) A match “or-pattern”. An or-pattern matches each of its subpatterns in turn to the subject, until one succeeds. The or-pattern is then deemed to succeed. If none of the subpatterns succeed the or- pattern fails. The "patterns" attribute contains a list of match pattern nodes that will be matched against the subject. >>> print(ast.dump(ast.parse(""" ... match x: ... case [x] | (y): ... ... ... """), indent=4)) Module( body=[ Match( subject=Name(id='x', ctx=Load()), cases=[ match_case( pattern=MatchOr( patterns=[ MatchSequence( patterns=[ MatchAs(name='x')]), MatchAs(name='y')]), body=[ Expr( value=Constant(value=Ellipsis))])])]) Added in version 3.10. Type annotations ---------------- class ast.TypeIgnore(lineno, tag) A "# type: ignore" comment located at *lineno*. *tag* is the optional tag specified by the form "# type: ignore ". >>> print(ast.dump(ast.parse('x = 1 # type: ignore', type_comments=True), indent=4)) Module( body=[ Assign( targets=[ Name(id='x', ctx=Store())], value=Constant(value=1))], type_ignores=[ TypeIgnore(lineno=1, tag='')]) >>> print(ast.dump(ast.parse('x: bool = 1 # type: ignore[assignment]', type_comments=True), indent=4)) Module( body=[ AnnAssign( target=Name(id='x', ctx=Store()), annotation=Name(id='bool', ctx=Load()), value=Constant(value=1), simple=1)], type_ignores=[ TypeIgnore(lineno=1, tag='[assignment]')]) Note: "TypeIgnore" nodes are not generated when the *type_comments* parameter is set to "False" (default). See "ast.parse()" for more details. Added in version 3.8. Type parameters --------------- Type parameters can exist on classes, functions, and type aliases. class ast.TypeVar(name, bound, default_value) A "typing.TypeVar". "name" is the name of the type variable. "bound" is the bound or constraints, if any. If "bound" is a "Tuple", it represents constraints; otherwise it represents the bound. "default_value" is the default value; if the "TypeVar" has no default, this attribute will be set to "None". >>> print(ast.dump(ast.parse("type Alias[T: int = bool] = list[T]"), indent=4)) Module( body=[ TypeAlias( name=Name(id='Alias', ctx=Store()), type_params=[ TypeVar( name='T', bound=Name(id='int', ctx=Load()), default_value=Name(id='bool', ctx=Load()))], value=Subscript( value=Name(id='list', ctx=Load()), slice=Name(id='T', ctx=Load()), ctx=Load()))]) Added in version 3.12. Changed in version 3.13: Added the *default_value* parameter. class ast.ParamSpec(name, default_value) A "typing.ParamSpec". "name" is the name of the parameter specification. "default_value" is the default value; if the "ParamSpec" has no default, this attribute will be set to "None". >>> print(ast.dump(ast.parse("type Alias[**P = [int, str]] = Callable[P, int]"), indent=4)) Module( body=[ TypeAlias( name=Name(id='Alias', ctx=Store()), type_params=[ ParamSpec( name='P', default_value=List( elts=[ Name(id='int', ctx=Load()), Name(id='str', ctx=Load())], ctx=Load()))], value=Subscript( value=Name(id='Callable', ctx=Load()), slice=Tuple( elts=[ Name(id='P', ctx=Load()), Name(id='int', ctx=Load())], ctx=Load()), ctx=Load()))]) Added in version 3.12. Changed in version 3.13: Added the *default_value* parameter. class ast.TypeVarTuple(name, default_value) A "typing.TypeVarTuple". "name" is the name of the type variable tuple. "default_value" is the default value; if the "TypeVarTuple" has no default, this attribute will be set to "None". >>> print(ast.dump(ast.parse("type Alias[*Ts = ()] = tuple[*Ts]"), indent=4)) Module( body=[ TypeAlias( name=Name(id='Alias', ctx=Store()), type_params=[ TypeVarTuple( name='Ts', default_value=Tuple(ctx=Load()))], value=Subscript( value=Name(id='tuple', ctx=Load()), slice=Tuple( elts=[ Starred( value=Name(id='Ts', ctx=Load()), ctx=Load())], ctx=Load()), ctx=Load()))]) Added in version 3.12. Changed in version 3.13: Added the *default_value* parameter. Function and class definitions ------------------------------ class ast.FunctionDef(name, args, body, decorator_list, returns, type_comment, type_params) A function definition. * "name" is a raw string of the function name. * "args" is an "arguments" node. * "body" is the list of nodes inside the function. * "decorator_list" is the list of decorators to be applied, stored outermost first (i.e. the first in the list will be applied last). * "returns" is the return annotation. * "type_params" is a list of type parameters. type_comment "type_comment" is an optional string with the type annotation as a comment. Changed in version 3.12: Added "type_params". class ast.Lambda(args, body) "lambda" is a minimal function definition that can be used inside an expression. Unlike "FunctionDef", "body" holds a single node. >>> print(ast.dump(ast.parse('lambda x,y: ...'), indent=4)) Module( body=[ Expr( value=Lambda( args=arguments( args=[ arg(arg='x'), arg(arg='y')]), body=Constant(value=Ellipsis)))]) class ast.arguments(posonlyargs, args, vararg, kwonlyargs, kw_defaults, kwarg, defaults) The arguments for a function. * "posonlyargs", "args" and "kwonlyargs" are lists of "arg" nodes. * "vararg" and "kwarg" are single "arg" nodes, referring to the "*args, **kwargs" parameters. * "kw_defaults" is a list of default values for keyword-only arguments. If one is "None", the corresponding argument is required. * "defaults" is a list of default values for arguments that can be passed positionally. If there are fewer defaults, they correspond to the last n arguments. class ast.arg(arg, annotation, type_comment) A single argument in a list. "arg" is a raw string of the argument name; "annotation" is its annotation, such as a "Name" node. type_comment "type_comment" is an optional string with the type annotation as a comment >>> print(ast.dump(ast.parse("""\ ... @decorator1 ... @decorator2 ... def f(a: 'annotation', b=1, c=2, *d, e, f=3, **g) -> 'return annotation': ... pass ... """), indent=4)) Module( body=[ FunctionDef( name='f', args=arguments( args=[ arg( arg='a', annotation=Constant(value='annotation')), arg(arg='b'), arg(arg='c')], vararg=arg(arg='d'), kwonlyargs=[ arg(arg='e'), arg(arg='f')], kw_defaults=[ None, Constant(value=3)], kwarg=arg(arg='g'), defaults=[ Constant(value=1), Constant(value=2)]), body=[ Pass()], decorator_list=[ Name(id='decorator1', ctx=Load()), Name(id='decorator2', ctx=Load())], returns=Constant(value='return annotation'))]) class ast.Return(value) A "return" statement. >>> print(ast.dump(ast.parse('return 4'), indent=4)) Module( body=[ Return( value=Constant(value=4))]) class ast.Yield(value) class ast.YieldFrom(value) A "yield" or "yield from" expression. Because these are expressions, they must be wrapped in an "Expr" node if the value sent back is not used. >>> print(ast.dump(ast.parse('yield x'), indent=4)) Module( body=[ Expr( value=Yield( value=Name(id='x', ctx=Load())))]) >>> print(ast.dump(ast.parse('yield from x'), indent=4)) Module( body=[ Expr( value=YieldFrom( value=Name(id='x', ctx=Load())))]) class ast.Global(names) class ast.Nonlocal(names) "global" and "nonlocal" statements. "names" is a list of raw strings. >>> print(ast.dump(ast.parse('global x,y,z'), indent=4)) Module( body=[ Global( names=[ 'x', 'y', 'z'])]) >>> print(ast.dump(ast.parse('nonlocal x,y,z'), indent=4)) Module( body=[ Nonlocal( names=[ 'x', 'y', 'z'])]) class ast.ClassDef(name, bases, keywords, body, decorator_list, type_params) A class definition. * "name" is a raw string for the class name * "bases" is a list of nodes for explicitly specified base classes. * "keywords" is a list of "keyword" nodes, principally for ‘metaclass’. Other keywords will be passed to the metaclass, as per **PEP 3115**. * "body" is a list of nodes representing the code within the class definition. * "decorator_list" is a list of nodes, as in "FunctionDef". * "type_params" is a list of type parameters. >>> print(ast.dump(ast.parse("""\ ... @decorator1 ... @decorator2 ... class Foo(base1, base2, metaclass=meta): ... pass ... """), indent=4)) Module( body=[ ClassDef( name='Foo', bases=[ Name(id='base1', ctx=Load()), Name(id='base2', ctx=Load())], keywords=[ keyword( arg='metaclass', value=Name(id='meta', ctx=Load()))], body=[ Pass()], decorator_list=[ Name(id='decorator1', ctx=Load()), Name(id='decorator2', ctx=Load())])]) Changed in version 3.12: Added "type_params". Async and await --------------- class ast.AsyncFunctionDef(name, args, body, decorator_list, returns, type_comment, type_params) An "async def" function definition. Has the same fields as "FunctionDef". Changed in version 3.12: Added "type_params". class ast.Await(value) An "await" expression. "value" is what it waits for. Only valid in the body of an "AsyncFunctionDef". >>> print(ast.dump(ast.parse("""\ ... async def f(): ... await other_func() ... """), indent=4)) Module( body=[ AsyncFunctionDef( name='f', args=arguments(), body=[ Expr( value=Await( value=Call( func=Name(id='other_func', ctx=Load()))))])]) class ast.AsyncFor(target, iter, body, orelse, type_comment) class ast.AsyncWith(items, body, type_comment) "async for" loops and "async with" context managers. They have the same fields as "For" and "With", respectively. Only valid in the body of an "AsyncFunctionDef". Note: When a string is parsed by "ast.parse()", operator nodes (subclasses of "ast.operator", "ast.unaryop", "ast.cmpop", "ast.boolop" and "ast.expr_context") on the returned tree will be singletons. Changes to one will be reflected in all other occurrences of the same value (e.g. "ast.Add"). "ast" Helpers ============= Apart from the node classes, the "ast" module defines these utility functions and classes for traversing abstract syntax trees: ast.parse(source, filename='', mode='exec', *, type_comments=False, feature_version=None, optimize=-1) Parse the source into an AST node. Equivalent to "compile(source, filename, mode, flags=FLAGS_VALUE, optimize=optimize)", where "FLAGS_VALUE" is "ast.PyCF_ONLY_AST" if "optimize <= 0" and "ast.PyCF_OPTIMIZED_AST" otherwise. If "type_comments=True" is given, the parser is modified to check and return type comments as specified by **PEP 484** and **PEP 526**. This is equivalent to adding "ast.PyCF_TYPE_COMMENTS" to the flags passed to "compile()". This will report syntax errors for misplaced type comments. Without this flag, type comments will be ignored, and the "type_comment" field on selected AST nodes will always be "None". In addition, the locations of "# type: ignore" comments will be returned as the "type_ignores" attribute of "Module" (otherwise it is always an empty list). In addition, if "mode" is "'func_type'", the input syntax is modified to correspond to **PEP 484** “signature type comments”, e.g. "(str, int) -> List[str]". Setting "feature_version" to a tuple "(major, minor)" will result in a “best-effort” attempt to parse using that Python version’s grammar. For example, setting "feature_version=(3, 9)" will attempt to disallow parsing of "match" statements. Currently "major" must equal to "3". The lowest supported version is "(3, 7)" (and this may increase in future Python versions); the highest is "sys.version_info[0:2]". “Best-effort” attempt means there is no guarantee that the parse (or success of the parse) is the same as when run on the Python version corresponding to "feature_version". If source contains a null character ("\0"), "ValueError" is raised. Warning: Note that successfully parsing source code into an AST object doesn’t guarantee that the source code provided is valid Python code that can be executed as the compilation step can raise further "SyntaxError" exceptions. For instance, the source "return 42" generates a valid AST node for a return statement, but it cannot be compiled alone (it needs to be inside a function node).In particular, "ast.parse()" won’t do any scoping checks, which the compilation step does. Warning: It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler. Changed in version 3.8: Added "type_comments", "mode='func_type'" and "feature_version". Changed in version 3.13: The minimum supported version for "feature_version" is now "(3, 7)". The "optimize" argument was added. ast.unparse(ast_obj) Unparse an "ast.AST" object and generate a string with code that would produce an equivalent "ast.AST" object if parsed back with "ast.parse()". Warning: The produced code string will not necessarily be equal to the original code that generated the "ast.AST" object (without any compiler optimizations, such as constant tuples/frozensets). Warning: Trying to unparse a highly complex expression would result with "RecursionError". Added in version 3.9. ast.literal_eval(node_or_string) Evaluate an expression node or a string containing only a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, "None" and "Ellipsis". This can be used for evaluating strings containing Python values without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing. This function had been documented as “safe” in the past without defining what that meant. That was misleading. This is specifically designed not to execute Python code, unlike the more general "eval()". There is no namespace, no name lookups, or ability to call out. But it is not free from attack: A relatively small input can lead to memory exhaustion or to C stack exhaustion, crashing the process. There is also the possibility for excessive CPU consumption denial of service on some inputs. Calling it on untrusted data is thus not recommended. Warning: It is possible to crash the Python interpreter due to stack depth limitations in Python’s AST compiler.It can raise "ValueError", "TypeError", "SyntaxError", "MemoryError" and "RecursionError" depending on the malformed input. Changed in version 3.2: Now allows bytes and set literals. Changed in version 3.9: Now supports creating empty sets with "'set()'". Changed in version 3.10: For string inputs, leading spaces and tabs are now stripped. ast.get_docstring(node, clean=True) Return the docstring of the given *node* (which must be a "FunctionDef", "AsyncFunctionDef", "ClassDef", or "Module" node), or "None" if it has no docstring. If *clean* is true, clean up the docstring’s indentation with "inspect.cleandoc()". Changed in version 3.5: "AsyncFunctionDef" is now supported. ast.get_source_segment(source, node, *, padded=False) Get source code segment of the *source* that generated *node*. If some location information ("lineno", "end_lineno", "col_offset", or "end_col_offset") is missing, return "None". If *padded* is "True", the first line of a multi-line statement will be padded with spaces to match its original position. Added in version 3.8. ast.fix_missing_locations(node) When you compile a node tree with "compile()", the compiler expects "lineno" and "col_offset" attributes for every node that supports them. This is rather tedious to fill in for generated nodes, so this helper adds these attributes recursively where not already set, by setting them to the values of the parent node. It works recursively starting at *node*. ast.increment_lineno(node, n=1) Increment the line number and end line number of each node in the tree starting at *node* by *n*. This is useful to “move code” to a different location in a file. ast.copy_location(new_node, old_node) Copy source location ("lineno", "col_offset", "end_lineno", and "end_col_offset") from *old_node* to *new_node* if possible, and return *new_node*. ast.iter_fields(node) Yield a tuple of "(fieldname, value)" for each field in "node._fields" that is present on *node*. ast.iter_child_nodes(node) Yield all direct child nodes of *node*, that is, all fields that are nodes and all items of fields that are lists of nodes. ast.walk(node) Recursively yield all descendant nodes in the tree starting at *node* (including *node* itself), in no specified order. This is useful if you only want to modify nodes in place and don’t care about the context. class ast.NodeVisitor A node visitor base class that walks the abstract syntax tree and calls a visitor function for every node found. This function may return a value which is forwarded by the "visit()" method. This class is meant to be subclassed, with the subclass adding visitor methods. visit(node) Visit a node. The default implementation calls the method called "self.visit_*classname*" where *classname* is the name of the node class, or "generic_visit()" if that method doesn’t exist. generic_visit(node) This visitor calls "visit()" on all children of the node. Note that child nodes of nodes that have a custom visitor method won’t be visited unless the visitor calls "generic_visit()" or visits them itself. visit_Constant(node) Handles all constant nodes. Don’t use the "NodeVisitor" if you want to apply changes to nodes during traversal. For this a special visitor exists ("NodeTransformer") that allows modifications. Deprecated since version 3.8: Methods "visit_Num()", "visit_Str()", "visit_Bytes()", "visit_NameConstant()" and "visit_Ellipsis()" are deprecated now and will not be called in future Python versions. Add the "visit_Constant()" method to handle all constant nodes. class ast.NodeTransformer A "NodeVisitor" subclass that walks the abstract syntax tree and allows modification of nodes. The "NodeTransformer" will walk the AST and use the return value of the visitor methods to replace or remove the old node. If the return value of the visitor method is "None", the node will be removed from its location, otherwise it is replaced with the return value. The return value may be the original node in which case no replacement takes place. Here is an example transformer that rewrites all occurrences of name lookups ("foo") to "data['foo']": class RewriteName(NodeTransformer): def visit_Name(self, node): return Subscript( value=Name(id='data', ctx=Load()), slice=Constant(value=node.id), ctx=node.ctx ) Keep in mind that if the node you’re operating on has child nodes you must either transform the child nodes yourself or call the "generic_visit()" method for the node first. For nodes that were part of a collection of statements (that applies to all statement nodes), the visitor may also return a list of nodes rather than just a single node. If "NodeTransformer" introduces new nodes (that weren’t part of original tree) without giving them location information (such as "lineno"), "fix_missing_locations()" should be called with the new sub-tree to recalculate the location information: tree = ast.parse('foo', mode='eval') new_tree = fix_missing_locations(RewriteName().visit(tree)) Usually you use the transformer like this: node = YourTransformer().visit(node) ast.dump(node, annotate_fields=True, include_attributes=False, *, indent=None, show_empty=False) Return a formatted dump of the tree in *node*. This is mainly useful for debugging purposes. If *annotate_fields* is true (by default), the returned string will show the names and the values for fields. If *annotate_fields* is false, the result string will be more compact by omitting unambiguous field names. Attributes such as line numbers and column offsets are not dumped by default. If this is wanted, *include_attributes* can be set to true. If *indent* is a non-negative integer or string, then the tree will be pretty-printed with that indent level. An indent level of 0, negative, or """" will only insert newlines. "None" (the default) selects the single line representation. Using a positive integer indent indents that many spaces per level. If *indent* is a string (such as ""\t""), that string is used to indent each level. If *show_empty* is false (the default), optional empty lists will be omitted from the output. Optional "None" values are always omitted. Changed in version 3.9: Added the *indent* option. Changed in version 3.13: Added the *show_empty* option. >>> print(ast.dump(ast.parse("""\ ... async def f(): ... await other_func() ... """), indent=4, show_empty=True)) Module( body=[ AsyncFunctionDef( name='f', args=arguments( posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[ Expr( value=Await( value=Call( func=Name(id='other_func', ctx=Load()), args=[], keywords=[])))], decorator_list=[], type_params=[])], type_ignores=[]) Compiler Flags ============== The following flags may be passed to "compile()" in order to change effects on the compilation of a program: ast.PyCF_ALLOW_TOP_LEVEL_AWAIT Enables support for top-level "await", "async for", "async with" and async comprehensions. Added in version 3.8. ast.PyCF_ONLY_AST Generates and returns an abstract syntax tree instead of returning a compiled code object. ast.PyCF_OPTIMIZED_AST The returned AST is optimized according to the *optimize* argument in "compile()" or "ast.parse()". Added in version 3.13. ast.PyCF_TYPE_COMMENTS Enables support for **PEP 484** and **PEP 526** style type comments ("# type: ", "# type: ignore "). Added in version 3.8. Command-Line Usage ================== Added in version 3.9. The "ast" module can be executed as a script from the command line. It is as simple as: python -m ast [-m ] [-a] [infile] The following options are accepted: -h, --help Show the help message and exit. -m --mode Specify what kind of code must be compiled, like the *mode* argument in "parse()". --no-type-comments Don’t parse type comments. -a, --include-attributes Include attributes such as line numbers and column offsets. -i --indent Indentation of nodes in AST (number of spaces). If "infile" is specified its contents are parsed to AST and dumped to stdout. Otherwise, the content is read from stdin. See also: Green Tree Snakes, an external documentation resource, has good details on working with Python ASTs. ASTTokens annotates Python ASTs with the positions of tokens and text in the source code that generated them. This is helpful for tools that make source code transformations. leoAst.py unifies the token-based and parse-tree-based views of python programs by inserting two-way links between tokens and ast nodes. LibCST parses code as a Concrete Syntax Tree that looks like an ast tree and keeps all formatting details. It’s useful for building automated refactoring (codemod) applications and linters. Parso is a Python parser that supports error recovery and round-trip parsing for different Python versions (in multiple Python versions). Parso is also able to list multiple syntax errors in your Python file. "asynchat" — Asynchronous socket command/response handler ********************************************************* Deprecated since version 3.6, removed in version 3.12. This module is no longer part of the Python standard library. It was removed in Python 3.12 after being deprecated in Python 3.6. The removal was decided in **PEP 594**. Applications should use the "asyncio" module instead. The last version of Python that provided the "asynchat" module was Python 3.11. High-level API Index ******************** This page lists all high-level async/await enabled asyncio APIs. Tasks ===== Utilities to run asyncio programs, create Tasks, and await on multiple things with timeouts. +----------------------------------------------------+----------------------------------------------------+ | "run()" | Create event loop, run a coroutine, close the | | | loop. | +----------------------------------------------------+----------------------------------------------------+ | "Runner" | A context manager that simplifies multiple async | | | function calls. | +----------------------------------------------------+----------------------------------------------------+ | "Task" | Task object. | +----------------------------------------------------+----------------------------------------------------+ | "TaskGroup" | A context manager that holds a group of tasks. | | | Provides a convenient and reliable way to wait for | | | all tasks in the group to finish. | +----------------------------------------------------+----------------------------------------------------+ | "create_task()" | Start an asyncio Task, then returns it. | +----------------------------------------------------+----------------------------------------------------+ | "current_task()" | Return the current Task. | +----------------------------------------------------+----------------------------------------------------+ | "all_tasks()" | Return all tasks that are not yet finished for an | | | event loop. | +----------------------------------------------------+----------------------------------------------------+ | "await" "sleep()" | Sleep for a number of seconds. | +----------------------------------------------------+----------------------------------------------------+ | "await" "gather()" | Schedule and wait for things concurrently. | +----------------------------------------------------+----------------------------------------------------+ | "await" "wait_for()" | Run with a timeout. | +----------------------------------------------------+----------------------------------------------------+ | "await" "shield()" | Shield from cancellation. | +----------------------------------------------------+----------------------------------------------------+ | "await" "wait()" | Monitor for completion. | +----------------------------------------------------+----------------------------------------------------+ | "timeout()" | Run with a timeout. Useful in cases when | | | "wait_for" is not suitable. | +----------------------------------------------------+----------------------------------------------------+ | "to_thread()" | Asynchronously run a function in a separate OS | | | thread. | +----------------------------------------------------+----------------------------------------------------+ | "run_coroutine_threadsafe()" | Schedule a coroutine from another OS thread. | +----------------------------------------------------+----------------------------------------------------+ | "for in" "as_completed()" | Monitor for completion with a "for" loop. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.gather() to run things in parallel. * Using asyncio.wait_for() to enforce a timeout. * Cancellation. * Using asyncio.sleep(). * See also the main Tasks documentation page. Queues ====== Queues should be used to distribute work amongst multiple asyncio Tasks, implement connection pools, and pub/sub patterns. +----------------------------------------------------+----------------------------------------------------+ | "Queue" | A FIFO queue. | +----------------------------------------------------+----------------------------------------------------+ | "PriorityQueue" | A priority queue. | +----------------------------------------------------+----------------------------------------------------+ | "LifoQueue" | A LIFO queue. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.Queue to distribute workload between several Tasks. * See also the Queues documentation page. Subprocesses ============ Utilities to spawn subprocesses and run shell commands. +----------------------------------------------------+----------------------------------------------------+ | "await" "create_subprocess_exec()" | Create a subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "await" "create_subprocess_shell()" | Run a shell command. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Executing a shell command. * See also the subprocess APIs documentation. Streams ======= High-level APIs to work with network IO. +----------------------------------------------------+----------------------------------------------------+ | "await" "open_connection()" | Establish a TCP connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "open_unix_connection()" | Establish a Unix socket connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "start_server()" | Start a TCP server. | +----------------------------------------------------+----------------------------------------------------+ | "await" "start_unix_server()" | Start a Unix socket server. | +----------------------------------------------------+----------------------------------------------------+ | "StreamReader" | High-level async/await object to receive network | | | data. | +----------------------------------------------------+----------------------------------------------------+ | "StreamWriter" | High-level async/await object to send network | | | data. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Example TCP client. * See also the streams APIs documentation. Synchronization =============== Threading-like synchronization primitives that can be used in Tasks. +----------------------------------------------------+----------------------------------------------------+ | "Lock" | A mutex lock. | +----------------------------------------------------+----------------------------------------------------+ | "Event" | An event object. | +----------------------------------------------------+----------------------------------------------------+ | "Condition" | A condition object. | +----------------------------------------------------+----------------------------------------------------+ | "Semaphore" | A semaphore. | +----------------------------------------------------+----------------------------------------------------+ | "BoundedSemaphore" | A bounded semaphore. | +----------------------------------------------------+----------------------------------------------------+ | "Barrier" | A barrier object. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.Event. * Using asyncio.Barrier. * See also the documentation of asyncio synchronization primitives. Exceptions ========== +----------------------------------------------------+----------------------------------------------------+ | "asyncio.CancelledError" | Raised when a Task is cancelled. See also | | | "Task.cancel()". | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.BrokenBarrierError" | Raised when a Barrier is broken. See also | | | "Barrier.wait()". | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Handling CancelledError to run code on cancellation request. * See also the full list of asyncio-specific exceptions. Developing with asyncio *********************** Asynchronous programming is different from classic “sequential” programming. This page lists common mistakes and traps and explains how to avoid them. Debug Mode ========== By default asyncio runs in production mode. In order to ease the development asyncio has a *debug mode*. There are several ways to enable asyncio debug mode: * Setting the "PYTHONASYNCIODEBUG" environment variable to "1". * Using the Python Development Mode. * Passing "debug=True" to "asyncio.run()". * Calling "loop.set_debug()". In addition to enabling the debug mode, consider also: * setting the log level of the asyncio logger to "logging.DEBUG", for example the following snippet of code can be run at startup of the application: logging.basicConfig(level=logging.DEBUG) * configuring the "warnings" module to display "ResourceWarning" warnings. One way of doing that is by using the "-W" "default" command line option. When the debug mode is enabled: * Many non-threadsafe asyncio APIs (such as "loop.call_soon()" and "loop.call_at()" methods) raise an exception if they are called from a wrong thread. * The execution time of the I/O selector is logged if it takes too long to perform an I/O operation. * Callbacks taking longer than 100 milliseconds are logged. The "loop.slow_callback_duration" attribute can be used to set the minimum execution duration in seconds that is considered “slow”. Concurrency and Multithreading ============================== An event loop runs in a thread (typically the main thread) and executes all callbacks and Tasks in its thread. While a Task is running in the event loop, no other Tasks can run in the same thread. When a Task executes an "await" expression, the running Task gets suspended, and the event loop executes the next Task. To schedule a *callback* from another OS thread, the "loop.call_soon_threadsafe()" method should be used. Example: loop.call_soon_threadsafe(callback, *args) Almost all asyncio objects are not thread safe, which is typically not a problem unless there is code that works with them from outside of a Task or a callback. If there’s a need for such code to call a low- level asyncio API, the "loop.call_soon_threadsafe()" method should be used, e.g.: loop.call_soon_threadsafe(fut.cancel) To schedule a coroutine object from a different OS thread, the "run_coroutine_threadsafe()" function should be used. It returns a "concurrent.futures.Future" to access the result: async def coro_func(): return await asyncio.sleep(1, 42) # Later in another OS thread: future = asyncio.run_coroutine_threadsafe(coro_func(), loop) # Wait for the result: result = future.result() To handle signals the event loop must be run in the main thread. The "loop.run_in_executor()" method can be used with a "concurrent.futures.ThreadPoolExecutor" to execute blocking code in a different OS thread without blocking the OS thread that the event loop runs in. There is currently no way to schedule coroutines or callbacks directly from a different process (such as one started with "multiprocessing"). The Event Loop Methods section lists APIs that can read from pipes and watch file descriptors without blocking the event loop. In addition, asyncio’s Subprocess APIs provide a way to start a process and communicate with it from the event loop. Lastly, the aforementioned "loop.run_in_executor()" method can also be used with a "concurrent.futures.ProcessPoolExecutor" to execute code in a different process. Running Blocking Code ===================== Blocking (CPU-bound) code should not be called directly. For example, if a function performs a CPU-intensive calculation for 1 second, all concurrent asyncio Tasks and IO operations would be delayed by 1 second. An executor can be used to run a task in a different thread or even in a different process to avoid blocking the OS thread with the event loop. See the "loop.run_in_executor()" method for more details. Logging ======= asyncio uses the "logging" module and all logging is performed via the ""asyncio"" logger. The default log level is "logging.INFO", which can be easily adjusted: logging.getLogger("asyncio").setLevel(logging.WARNING) Network logging can block the event loop. It is recommended to use a separate thread for handling logs or use non-blocking IO. For example, see Dealing with handlers that block. Detect never-awaited coroutines =============================== When a coroutine function is called, but not awaited (e.g. "coro()" instead of "await coro()") or the coroutine is not scheduled with "asyncio.create_task()", asyncio will emit a "RuntimeWarning": import asyncio async def test(): print("never scheduled") async def main(): test() asyncio.run(main()) Output: test.py:7: RuntimeWarning: coroutine 'test' was never awaited test() Output in debug mode: test.py:7: RuntimeWarning: coroutine 'test' was never awaited Coroutine created at (most recent call last) File "../t.py", line 9, in asyncio.run(main(), debug=True) < .. > File "../t.py", line 7, in main test() test() The usual fix is to either await the coroutine or call the "asyncio.create_task()" function: async def main(): await test() Detect never-retrieved exceptions ================================= If a "Future.set_exception()" is called but the Future object is never awaited on, the exception would never be propagated to the user code. In this case, asyncio would emit a log message when the Future object is garbage collected. Example of an unhandled exception: import asyncio async def bug(): raise Exception("not consumed") async def main(): asyncio.create_task(bug()) asyncio.run(main()) Output: Task exception was never retrieved future: exception=Exception('not consumed')> Traceback (most recent call last): File "test.py", line 4, in bug raise Exception("not consumed") Exception: not consumed Enable the debug mode to get the traceback where the task was created: asyncio.run(main(), debug=True) Output in debug mode: Task exception was never retrieved future: exception=Exception('not consumed') created at asyncio/tasks.py:321> source_traceback: Object created at (most recent call last): File "../t.py", line 9, in asyncio.run(main(), debug=True) < .. > Traceback (most recent call last): File "../t.py", line 4, in bug raise Exception("not consumed") Exception: not consumed Event Loop ********** **Source code:** Lib/asyncio/events.py, Lib/asyncio/base_events.py ====================================================================== -[ Preface ]- The event loop is the core of every asyncio application. Event loops run asynchronous tasks and callbacks, perform network IO operations, and run subprocesses. Application developers should typically use the high-level asyncio functions, such as "asyncio.run()", and should rarely need to reference the loop object or call its methods. This section is intended mostly for authors of lower-level code, libraries, and frameworks, who need finer control over the event loop behavior. -[ Obtaining the Event Loop ]- The following low-level functions can be used to get, set, or create an event loop: asyncio.get_running_loop() Return the running event loop in the current OS thread. Raise a "RuntimeError" if there is no running event loop. This function can only be called from a coroutine or a callback. Added in version 3.7. asyncio.get_event_loop() Get the current event loop. When called from a coroutine or a callback (e.g. scheduled with call_soon or similar API), this function will always return the running event loop. If there is no running event loop set, the function will return the result of the "get_event_loop_policy().get_event_loop()" call. Because this function has rather complex behavior (especially when custom event loop policies are in use), using the "get_running_loop()" function is preferred to "get_event_loop()" in coroutines and callbacks. As noted above, consider using the higher-level "asyncio.run()" function, instead of using these lower level functions to manually create and close an event loop. Deprecated since version 3.12: Deprecation warning is emitted if there is no current event loop. In some future Python release this will become an error. asyncio.set_event_loop(loop) Set *loop* as the current event loop for the current OS thread. asyncio.new_event_loop() Create and return a new event loop object. Note that the behaviour of "get_event_loop()", "set_event_loop()", and "new_event_loop()" functions can be altered by setting a custom event loop policy. -[ Contents ]- This documentation page contains the following sections: * The Event Loop Methods section is the reference documentation of the event loop APIs; * The Callback Handles section documents the "Handle" and "TimerHandle" instances which are returned from scheduling methods such as "loop.call_soon()" and "loop.call_later()"; * The Server Objects section documents types returned from event loop methods like "loop.create_server()"; * The Event Loop Implementations section documents the "SelectorEventLoop" and "ProactorEventLoop" classes; * The Examples section showcases how to work with some event loop APIs. Event Loop Methods ================== Event loops have **low-level** APIs for the following: * Running and stopping the loop * Scheduling callbacks * Scheduling delayed callbacks * Creating Futures and Tasks * Opening network connections * Creating network servers * Transferring files * TLS Upgrade * Watching file descriptors * Working with socket objects directly * DNS * Working with pipes * Unix signals * Executing code in thread or process pools * Error Handling API * Enabling debug mode * Running Subprocesses Running and stopping the loop ----------------------------- loop.run_until_complete(future) Run until the *future* (an instance of "Future") has completed. If the argument is a coroutine object it is implicitly scheduled to run as a "asyncio.Task". Return the Future’s result or raise its exception. loop.run_forever() Run the event loop until "stop()" is called. If "stop()" is called before "run_forever()" is called, the loop will poll the I/O selector once with a timeout of zero, run all callbacks scheduled in response to I/O events (and those that were already scheduled), and then exit. If "stop()" is called while "run_forever()" is running, the loop will run the current batch of callbacks and then exit. Note that new callbacks scheduled by callbacks will not run in this case; instead, they will run the next time "run_forever()" or "run_until_complete()" is called. loop.stop() Stop the event loop. loop.is_running() Return "True" if the event loop is currently running. loop.is_closed() Return "True" if the event loop was closed. loop.close() Close the event loop. The loop must not be running when this function is called. Any pending callbacks will be discarded. This method clears all queues and shuts down the executor, but does not wait for the executor to finish. This method is idempotent and irreversible. No other methods should be called after the event loop is closed. async loop.shutdown_asyncgens() Schedule all currently open *asynchronous generator* objects to close with an "aclose()" call. After calling this method, the event loop will issue a warning if a new asynchronous generator is iterated. This should be used to reliably finalize all scheduled asynchronous generators. Note that there is no need to call this function when "asyncio.run()" is used. Example: try: loop.run_forever() finally: loop.run_until_complete(loop.shutdown_asyncgens()) loop.close() Added in version 3.6. async loop.shutdown_default_executor(timeout=None) Schedule the closure of the default executor and wait for it to join all of the threads in the "ThreadPoolExecutor". Once this method has been called, using the default executor with "loop.run_in_executor()" will raise a "RuntimeError". The *timeout* parameter specifies the amount of time (in "float" seconds) the executor will be given to finish joining. With the default, "None", the executor is allowed an unlimited amount of time. If the *timeout* is reached, a "RuntimeWarning" is emitted and the default executor is terminated without waiting for its threads to finish joining. Note: Do not call this method when using "asyncio.run()", as the latter handles default executor shutdown automatically. Added in version 3.9. Changed in version 3.12: Added the *timeout* parameter. Scheduling callbacks -------------------- loop.call_soon(callback, *args, context=None) Schedule the *callback* *callback* to be called with *args* arguments at the next iteration of the event loop. Return an instance of "asyncio.Handle", which can be used later to cancel the callback. Callbacks are called in the order in which they are registered. Each callback will be called exactly once. The optional keyword-only *context* argument specifies a custom "contextvars.Context" for the *callback* to run in. Callbacks use the current context when no *context* is provided. Unlike "call_soon_threadsafe()", this method is not thread-safe. loop.call_soon_threadsafe(callback, *args, context=None) A thread-safe variant of "call_soon()". When scheduling callbacks from another thread, this function *must* be used, since "call_soon()" is not thread-safe. This function is safe to be called from a reentrant context or signal handler, however, it is not safe or fruitful to use the returned handle in such contexts. Raises "RuntimeError" if called on a loop that’s been closed. This can happen on a secondary thread when the main application is shutting down. See the concurrency and multithreading section of the documentation. Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. Note: Most "asyncio" scheduling functions don’t allow passing keyword arguments. To do that, use "functools.partial()": # will schedule "print("Hello", flush=True)" loop.call_soon( functools.partial(print, "Hello", flush=True)) Using partial objects is usually more convenient than using lambdas, as asyncio can render partial objects better in debug and error messages. Scheduling delayed callbacks ---------------------------- Event loop provides mechanisms to schedule callback functions to be called at some point in the future. Event loop uses monotonic clocks to track time. loop.call_later(delay, callback, *args, context=None) Schedule *callback* to be called after the given *delay* number of seconds (can be either an int or a float). An instance of "asyncio.TimerHandle" is returned which can be used to cancel the callback. *callback* will be called exactly once. If two callbacks are scheduled for exactly the same time, the order in which they are called is undefined. The optional positional *args* will be passed to the callback when it is called. If you want the callback to be called with keyword arguments use "functools.partial()". An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *callback* to run in. The current context is used when no *context* is provided. Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. Changed in version 3.8: In Python 3.7 and earlier with the default event loop implementation, the *delay* could not exceed one day. This has been fixed in Python 3.8. loop.call_at(when, callback, *args, context=None) Schedule *callback* to be called at the given absolute timestamp *when* (an int or a float), using the same time reference as "loop.time()". This method’s behavior is the same as "call_later()". An instance of "asyncio.TimerHandle" is returned which can be used to cancel the callback. Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. Changed in version 3.8: In Python 3.7 and earlier with the default event loop implementation, the difference between *when* and the current time could not exceed one day. This has been fixed in Python 3.8. loop.time() Return the current time, as a "float" value, according to the event loop’s internal monotonic clock. Note: Changed in version 3.8: In Python 3.7 and earlier timeouts (relative *delay* or absolute *when*) should not exceed one day. This has been fixed in Python 3.8. See also: The "asyncio.sleep()" function. Creating Futures and Tasks -------------------------- loop.create_future() Create an "asyncio.Future" object attached to the event loop. This is the preferred way to create Futures in asyncio. This lets third-party event loops provide alternative implementations of the Future object (with better performance or instrumentation). Added in version 3.5.2. loop.create_task(coro, *, name=None, context=None, **kwargs) Schedule the execution of coroutine *coro*. Return a "Task" object. Third-party event loops can use their own subclass of "Task" for interoperability. In this case, the result type is a subclass of "Task". The full function signature is largely the same as that of the "Task" constructor (or factory) - all of the keyword arguments to this function are passed through to that interface, except *name*, or *context* if it is "None". If the *name* argument is provided and not "None", it is set as the name of the task using "Task.set_name()". An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *coro* to run in. The current context copy is created when no *context* is provided. Changed in version 3.8: Added the *name* parameter. Changed in version 3.11: Added the *context* parameter. Changed in version 3.13.3: Added "kwargs" which passes on arbitrary extra parameters, including "name" and "context". Changed in version 3.13.4: Rolled back the change that passes on *name* and *context* (if it is None), while still passing on other arbitrary keyword arguments (to avoid breaking backwards compatibility with 3.13.3). loop.set_task_factory(factory) Set a task factory that will be used by "loop.create_task()". If *factory* is "None" the default task factory will be set. Otherwise, *factory* must be a *callable* with the signature matching "(loop, coro, **kwargs)", where *loop* is a reference to the active event loop, and *coro* is a coroutine object. The callable must pass on all *kwargs*, and return a "asyncio.Task"-compatible object. Changed in version 3.13.3: Required that all *kwargs* are passed on to "asyncio.Task". Changed in version 3.13.4: *name* is no longer passed to task factories. *context* is no longer passed to task factories if it is "None". loop.get_task_factory() Return a task factory or "None" if the default one is in use. Opening network connections --------------------------- async loop.create_connection(protocol_factory, host=None, port=None, *, ssl=None, family=0, proto=0, flags=0, sock=None, local_addr=None, server_hostname=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None, happy_eyeballs_delay=None, interleave=None, all_errors=False) Open a streaming transport connection to a given address specified by *host* and *port*. The socket family can be either "AF_INET" or "AF_INET6" depending on *host* (or the *family* argument, if provided). The socket type will be "SOCK_STREAM". *protocol_factory* must be a callable returning an asyncio protocol implementation. This method will try to establish the connection in the background. When successful, it returns a "(transport, protocol)" pair. The chronological synopsis of the underlying operation is as follows: 1. The connection is established and a transport is created for it. 2. *protocol_factory* is called without arguments and is expected to return a protocol instance. 3. The protocol instance is coupled with the transport by calling its "connection_made()" method. 4. A "(transport, protocol)" tuple is returned on success. The created transport is an implementation-dependent bidirectional stream. Other arguments: * *ssl*: if given and not false, a SSL/TLS transport is created (by default a plain TCP transport is created). If *ssl* is a "ssl.SSLContext" object, this context is used to create the transport; if *ssl* is "True", a default context returned from "ssl.create_default_context()" is used. See also: SSL/TLS security considerations * *server_hostname* sets or overrides the hostname that the target server’s certificate will be matched against. Should only be passed if *ssl* is not "None". By default the value of the *host* argument is used. If *host* is empty, there is no default and you must pass a value for *server_hostname*. If *server_hostname* is an empty string, hostname matching is disabled (which is a serious security risk, allowing for potential man-in-the-middle attacks). * *family*, *proto*, *flags* are the optional address family, protocol and flags to be passed through to getaddrinfo() for *host* resolution. If given, these should all be integers from the corresponding "socket" module constants. * *happy_eyeballs_delay*, if given, enables Happy Eyeballs for this connection. It should be a floating-point number representing the amount of time in seconds to wait for a connection attempt to complete, before starting the next attempt in parallel. This is the “Connection Attempt Delay” as defined in **RFC 8305**. A sensible default value recommended by the RFC is "0.25" (250 milliseconds). * *interleave* controls address reordering when a host name resolves to multiple IP addresses. If "0" or unspecified, no reordering is done, and addresses are tried in the order returned by "getaddrinfo()". If a positive integer is specified, the addresses are interleaved by address family, and the given integer is interpreted as “First Address Family Count” as defined in **RFC 8305**. The default is "0" if *happy_eyeballs_delay* is not specified, and "1" if it is. * *sock*, if given, should be an existing, already connected "socket.socket" object to be used by the transport. If *sock* is given, none of *host*, *port*, *family*, *proto*, *flags*, *happy_eyeballs_delay*, *interleave* and *local_addr* should be specified. Note: The *sock* argument transfers ownership of the socket to the transport created. To close the socket, call the transport’s "close()" method. * *local_addr*, if given, is a "(local_host, local_port)" tuple used to bind the socket locally. The *local_host* and *local_port* are looked up using "getaddrinfo()", similarly to *host* and *port*. * *ssl_handshake_timeout* is (for a TLS connection) the time in seconds to wait for the TLS handshake to complete before aborting the connection. "60.0" seconds if "None" (default). * *ssl_shutdown_timeout* is the time in seconds to wait for the SSL shutdown to complete before aborting the connection. "30.0" seconds if "None" (default). * *all_errors* determines what exceptions are raised when a connection cannot be created. By default, only a single "Exception" is raised: the first exception if there is only one or all errors have same message, or a single "OSError" with the error messages combined. When "all_errors" is "True", an "ExceptionGroup" will be raised containing all exceptions (even if there is only one). Changed in version 3.5: Added support for SSL/TLS in "ProactorEventLoop". Changed in version 3.6: The socket option socket.TCP_NODELAY is set by default for all TCP connections. Changed in version 3.7: Added the *ssl_handshake_timeout* parameter. Changed in version 3.8: Added the *happy_eyeballs_delay* and *interleave* parameters.Happy Eyeballs Algorithm: Success with Dual-Stack Hosts. When a server’s IPv4 path and protocol are working, but the server’s IPv6 path and protocol are not working, a dual-stack client application experiences significant connection delay compared to an IPv4-only client. This is undesirable because it causes the dual-stack client to have a worse user experience. This document specifies requirements for algorithms that reduce this user-visible delay and provides an algorithm.For more information: https://datatracker.ietf.org/doc/html/rfc6555 Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. Changed in version 3.12: *all_errors* was added. See also: The "open_connection()" function is a high-level alternative API. It returns a pair of ("StreamReader", "StreamWriter") that can be used directly in async/await code. async loop.create_datagram_endpoint(protocol_factory, local_addr=None, remote_addr=None, *, family=0, proto=0, flags=0, reuse_port=None, allow_broadcast=None, sock=None) Create a datagram connection. The socket family can be either "AF_INET", "AF_INET6", or "AF_UNIX", depending on *host* (or the *family* argument, if provided). The socket type will be "SOCK_DGRAM". *protocol_factory* must be a callable returning a protocol implementation. A tuple of "(transport, protocol)" is returned on success. Other arguments: * *local_addr*, if given, is a "(local_host, local_port)" tuple used to bind the socket locally. The *local_host* and *local_port* are looked up using "getaddrinfo()". * *remote_addr*, if given, is a "(remote_host, remote_port)" tuple used to connect the socket to a remote address. The *remote_host* and *remote_port* are looked up using "getaddrinfo()". * *family*, *proto*, *flags* are the optional address family, protocol and flags to be passed through to "getaddrinfo()" for *host* resolution. If given, these should all be integers from the corresponding "socket" module constants. * *reuse_port* tells the kernel to allow this endpoint to be bound to the same port as other existing endpoints are bound to, so long as they all set this flag when being created. This option is not supported on Windows and some Unixes. If the socket.SO_REUSEPORT constant is not defined then this capability is unsupported. * *allow_broadcast* tells the kernel to allow this endpoint to send messages to the broadcast address. * *sock* can optionally be specified in order to use a preexisting, already connected, "socket.socket" object to be used by the transport. If specified, *local_addr* and *remote_addr* should be omitted (must be "None"). Note: The *sock* argument transfers ownership of the socket to the transport created. To close the socket, call the transport’s "close()" method. See UDP echo client protocol and UDP echo server protocol examples. Changed in version 3.4.4: The *family*, *proto*, *flags*, *reuse_address*, *reuse_port*, *allow_broadcast*, and *sock* parameters were added. Changed in version 3.8: Added support for Windows. Changed in version 3.8.1: The *reuse_address* parameter is no longer supported, as using socket.SO_REUSEADDR poses a significant security concern for UDP. Explicitly passing "reuse_address=True" will raise an exception.When multiple processes with differing UIDs assign sockets to an identical UDP socket address with "SO_REUSEADDR", incoming packets can become randomly distributed among the sockets.For supported platforms, *reuse_port* can be used as a replacement for similar functionality. With *reuse_port*, socket.SO_REUSEPORT is used instead, which specifically prevents processes with differing UIDs from assigning sockets to the same socket address. Changed in version 3.11: The *reuse_address* parameter, disabled since Python 3.8.1, 3.7.6 and 3.6.10, has been entirely removed. async loop.create_unix_connection(protocol_factory, path=None, *, ssl=None, sock=None, server_hostname=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None) Create a Unix connection. The socket family will be "AF_UNIX"; socket type will be "SOCK_STREAM". A tuple of "(transport, protocol)" is returned on success. *path* is the name of a Unix domain socket and is required, unless a *sock* parameter is specified. Abstract Unix sockets, "str", "bytes", and "Path" paths are supported. See the documentation of the "loop.create_connection()" method for information about arguments to this method. Availability: Unix. Changed in version 3.7: Added the *ssl_handshake_timeout* parameter. The *path* parameter can now be a *path-like object*. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. Creating network servers ------------------------ async loop.create_server(protocol_factory, host=None, port=None, *, family=socket.AF_UNSPEC, flags=socket.AI_PASSIVE, sock=None, backlog=100, ssl=None, reuse_address=None, reuse_port=None, keep_alive=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None, start_serving=True) Create a TCP server (socket type "SOCK_STREAM") listening on *port* of the *host* address. Returns a "Server" object. Arguments: * *protocol_factory* must be a callable returning a protocol implementation. * The *host* parameter can be set to several types which determine where the server would be listening: * If *host* is a string, the TCP server is bound to a single network interface specified by *host*. * If *host* is a sequence of strings, the TCP server is bound to all network interfaces specified by the sequence. * If *host* is an empty string or "None", all interfaces are assumed and a list of multiple sockets will be returned (most likely one for IPv4 and another one for IPv6). * The *port* parameter can be set to specify which port the server should listen on. If "0" or "None" (the default), a random unused port will be selected (note that if *host* resolves to multiple network interfaces, a different random port will be selected for each interface). * *family* can be set to either "socket.AF_INET" or "AF_INET6" to force the socket to use IPv4 or IPv6. If not set, the *family* will be determined from host name (defaults to "AF_UNSPEC"). * *flags* is a bitmask for "getaddrinfo()". * *sock* can optionally be specified in order to use a preexisting socket object. If specified, *host* and *port* must not be specified. Note: The *sock* argument transfers ownership of the socket to the server created. To close the socket, call the server’s "close()" method. * *backlog* is the maximum number of queued connections passed to "listen()" (defaults to 100). * *ssl* can be set to an "SSLContext" instance to enable TLS over the accepted connections. * *reuse_address* tells the kernel to reuse a local socket in "TIME_WAIT" state, without waiting for its natural timeout to expire. If not specified will automatically be set to "True" on Unix. * *reuse_port* tells the kernel to allow this endpoint to be bound to the same port as other existing endpoints are bound to, so long as they all set this flag when being created. This option is not supported on Windows. * *keep_alive* set to "True" keeps connections active by enabling the periodic transmission of messages. Changed in version 3.13: Added the *keep_alive* parameter. * *ssl_handshake_timeout* is (for a TLS server) the time in seconds to wait for the TLS handshake to complete before aborting the connection. "60.0" seconds if "None" (default). * *ssl_shutdown_timeout* is the time in seconds to wait for the SSL shutdown to complete before aborting the connection. "30.0" seconds if "None" (default). * *start_serving* set to "True" (the default) causes the created server to start accepting connections immediately. When set to "False", the user should await on "Server.start_serving()" or "Server.serve_forever()" to make the server to start accepting connections. Changed in version 3.5: Added support for SSL/TLS in "ProactorEventLoop". Changed in version 3.5.1: The *host* parameter can be a sequence of strings. Changed in version 3.6: Added *ssl_handshake_timeout* and *start_serving* parameters. The socket option socket.TCP_NODELAY is set by default for all TCP connections. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. See also: The "start_server()" function is a higher-level alternative API that returns a pair of "StreamReader" and "StreamWriter" that can be used in an async/await code. async loop.create_unix_server(protocol_factory, path=None, *, sock=None, backlog=100, ssl=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None, start_serving=True, cleanup_socket=True) Similar to "loop.create_server()" but works with the "AF_UNIX" socket family. *path* is the name of a Unix domain socket, and is required, unless a *sock* argument is provided. Abstract Unix sockets, "str", "bytes", and "Path" paths are supported. If *cleanup_socket* is true then the Unix socket will automatically be removed from the filesystem when the server is closed, unless the socket has been replaced after the server has been created. See the documentation of the "loop.create_server()" method for information about arguments to this method. Availability: Unix. Changed in version 3.7: Added the *ssl_handshake_timeout* and *start_serving* parameters. The *path* parameter can now be a "Path" object. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. Changed in version 3.13: Added the *cleanup_socket* parameter. async loop.connect_accepted_socket(protocol_factory, sock, *, ssl=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None) Wrap an already accepted connection into a transport/protocol pair. This method can be used by servers that accept connections outside of asyncio but that use asyncio to handle them. Parameters: * *protocol_factory* must be a callable returning a protocol implementation. * *sock* is a preexisting socket object returned from "socket.accept". Note: The *sock* argument transfers ownership of the socket to the transport created. To close the socket, call the transport’s "close()" method. * *ssl* can be set to an "SSLContext" to enable SSL over the accepted connections. * *ssl_handshake_timeout* is (for an SSL connection) the time in seconds to wait for the SSL handshake to complete before aborting the connection. "60.0" seconds if "None" (default). * *ssl_shutdown_timeout* is the time in seconds to wait for the SSL shutdown to complete before aborting the connection. "30.0" seconds if "None" (default). Returns a "(transport, protocol)" pair. Added in version 3.5.3. Changed in version 3.7: Added the *ssl_handshake_timeout* parameter. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. Transferring files ------------------ async loop.sendfile(transport, file, offset=0, count=None, *, fallback=True) Send a *file* over a *transport*. Return the total number of bytes sent. The method uses high-performance "os.sendfile()" if available. *file* must be a regular file object opened in binary mode. *offset* tells from where to start reading the file. If specified, *count* is the total number of bytes to transmit as opposed to sending the file until EOF is reached. File position is always updated, even when this method raises an error, and "file.tell()" can be used to obtain the actual number of bytes sent. *fallback* set to "True" makes asyncio to manually read and send the file when the platform does not support the sendfile system call (e.g. Windows or SSL socket on Unix). Raise "SendfileNotAvailableError" if the system does not support the *sendfile* syscall and *fallback* is "False". Added in version 3.7. TLS Upgrade ----------- async loop.start_tls(transport, protocol, sslcontext, *, server_side=False, server_hostname=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None) Upgrade an existing transport-based connection to TLS. Create a TLS coder/decoder instance and insert it between the *transport* and the *protocol*. The coder/decoder implements both *transport*-facing protocol and *protocol*-facing transport. Return the created two-interface instance. After *await*, the *protocol* must stop using the original *transport* and communicate with the returned object only because the coder caches *protocol*-side data and sporadically exchanges extra TLS session packets with *transport*. In some situations (e.g. when the passed transport is already closing) this may return "None". Parameters: * *transport* and *protocol* instances that methods like "create_server()" and "create_connection()" return. * *sslcontext*: a configured instance of "SSLContext". * *server_side* pass "True" when a server-side connection is being upgraded (like the one created by "create_server()"). * *server_hostname*: sets or overrides the host name that the target server’s certificate will be matched against. * *ssl_handshake_timeout* is (for a TLS connection) the time in seconds to wait for the TLS handshake to complete before aborting the connection. "60.0" seconds if "None" (default). * *ssl_shutdown_timeout* is the time in seconds to wait for the SSL shutdown to complete before aborting the connection. "30.0" seconds if "None" (default). Added in version 3.7. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. Watching file descriptors ------------------------- loop.add_reader(fd, callback, *args) Start monitoring the *fd* file descriptor for read availability and invoke *callback* with the specified arguments once *fd* is available for reading. Any preexisting callback registered for *fd* is cancelled and replaced by *callback*. loop.remove_reader(fd) Stop monitoring the *fd* file descriptor for read availability. Returns "True" if *fd* was previously being monitored for reads. loop.add_writer(fd, callback, *args) Start monitoring the *fd* file descriptor for write availability and invoke *callback* with the specified arguments once *fd* is available for writing. Any preexisting callback registered for *fd* is cancelled and replaced by *callback*. Use "functools.partial()" to pass keyword arguments to *callback*. loop.remove_writer(fd) Stop monitoring the *fd* file descriptor for write availability. Returns "True" if *fd* was previously being monitored for writes. See also Platform Support section for some limitations of these methods. Working with socket objects directly ------------------------------------ In general, protocol implementations that use transport-based APIs such as "loop.create_connection()" and "loop.create_server()" are faster than implementations that work with sockets directly. However, there are some use cases when performance is not critical, and working with "socket" objects directly is more convenient. async loop.sock_recv(sock, nbytes) Receive up to *nbytes* from *sock*. Asynchronous version of "socket.recv()". Return the received data as a bytes object. *sock* must be a non-blocking socket. Changed in version 3.7: Even though this method was always documented as a coroutine method, releases before Python 3.7 returned a "Future". Since Python 3.7 this is an "async def" method. async loop.sock_recv_into(sock, buf) Receive data from *sock* into the *buf* buffer. Modeled after the blocking "socket.recv_into()" method. Return the number of bytes written to the buffer. *sock* must be a non-blocking socket. Added in version 3.7. async loop.sock_recvfrom(sock, bufsize) Receive a datagram of up to *bufsize* from *sock*. Asynchronous version of "socket.recvfrom()". Return a tuple of (received data, remote address). *sock* must be a non-blocking socket. Added in version 3.11. async loop.sock_recvfrom_into(sock, buf, nbytes=0) Receive a datagram of up to *nbytes* from *sock* into *buf*. Asynchronous version of "socket.recvfrom_into()". Return a tuple of (number of bytes received, remote address). *sock* must be a non-blocking socket. Added in version 3.11. async loop.sock_sendall(sock, data) Send *data* to the *sock* socket. Asynchronous version of "socket.sendall()". This method continues to send to the socket until either all data in *data* has been sent or an error occurs. "None" is returned on success. On error, an exception is raised. Additionally, there is no way to determine how much data, if any, was successfully processed by the receiving end of the connection. *sock* must be a non-blocking socket. Changed in version 3.7: Even though the method was always documented as a coroutine method, before Python 3.7 it returned a "Future". Since Python 3.7, this is an "async def" method. async loop.sock_sendto(sock, data, address) Send a datagram from *sock* to *address*. Asynchronous version of "socket.sendto()". Return the number of bytes sent. *sock* must be a non-blocking socket. Added in version 3.11. async loop.sock_connect(sock, address) Connect *sock* to a remote socket at *address*. Asynchronous version of "socket.connect()". *sock* must be a non-blocking socket. Changed in version 3.5.2: "address" no longer needs to be resolved. "sock_connect" will try to check if the *address* is already resolved by calling "socket.inet_pton()". If not, "loop.getaddrinfo()" will be used to resolve the *address*. See also: "loop.create_connection()" and "asyncio.open_connection()". async loop.sock_accept(sock) Accept a connection. Modeled after the blocking "socket.accept()" method. The socket must be bound to an address and listening for connections. The return value is a pair "(conn, address)" where *conn* is a *new* socket object usable to send and receive data on the connection, and *address* is the address bound to the socket on the other end of the connection. *sock* must be a non-blocking socket. Changed in version 3.7: Even though the method was always documented as a coroutine method, before Python 3.7 it returned a "Future". Since Python 3.7, this is an "async def" method. See also: "loop.create_server()" and "start_server()". async loop.sock_sendfile(sock, file, offset=0, count=None, *, fallback=True) Send a file using high-performance "os.sendfile" if possible. Return the total number of bytes sent. Asynchronous version of "socket.sendfile()". *sock* must be a non-blocking "socket.SOCK_STREAM" "socket". *file* must be a regular file object open in binary mode. *offset* tells from where to start reading the file. If specified, *count* is the total number of bytes to transmit as opposed to sending the file until EOF is reached. File position is always updated, even when this method raises an error, and "file.tell()" can be used to obtain the actual number of bytes sent. *fallback*, when set to "True", makes asyncio manually read and send the file when the platform does not support the sendfile syscall (e.g. Windows or SSL socket on Unix). Raise "SendfileNotAvailableError" if the system does not support *sendfile* syscall and *fallback* is "False". *sock* must be a non-blocking socket. Added in version 3.7. DNS --- async loop.getaddrinfo(host, port, *, family=0, type=0, proto=0, flags=0) Asynchronous version of "socket.getaddrinfo()". async loop.getnameinfo(sockaddr, flags=0) Asynchronous version of "socket.getnameinfo()". Note: Both *getaddrinfo* and *getnameinfo* internally utilize their synchronous versions through the loop’s default thread pool executor. When this executor is saturated, these methods may experience delays, which higher-level networking libraries may report as increased timeouts. To mitigate this, consider using a custom executor for other user tasks, or setting a default executor with a larger number of workers. Changed in version 3.7: Both *getaddrinfo* and *getnameinfo* methods were always documented to return a coroutine, but prior to Python 3.7 they were, in fact, returning "asyncio.Future" objects. Starting with Python 3.7 both methods are coroutines. Working with pipes ------------------ async loop.connect_read_pipe(protocol_factory, pipe) Register the read end of *pipe* in the event loop. *protocol_factory* must be a callable returning an asyncio protocol implementation. *pipe* is a *file-like object*. Return pair "(transport, protocol)", where *transport* supports the "ReadTransport" interface and *protocol* is an object instantiated by the *protocol_factory*. With "SelectorEventLoop" event loop, the *pipe* is set to non- blocking mode. async loop.connect_write_pipe(protocol_factory, pipe) Register the write end of *pipe* in the event loop. *protocol_factory* must be a callable returning an asyncio protocol implementation. *pipe* is *file-like object*. Return pair "(transport, protocol)", where *transport* supports "WriteTransport" interface and *protocol* is an object instantiated by the *protocol_factory*. With "SelectorEventLoop" event loop, the *pipe* is set to non- blocking mode. Note: "SelectorEventLoop" does not support the above methods on Windows. Use "ProactorEventLoop" instead for Windows. See also: The "loop.subprocess_exec()" and "loop.subprocess_shell()" methods. Unix signals ------------ loop.add_signal_handler(signum, callback, *args) Set *callback* as the handler for the *signum* signal. The callback will be invoked by *loop*, along with other queued callbacks and runnable coroutines of that event loop. Unlike signal handlers registered using "signal.signal()", a callback registered with this function is allowed to interact with the event loop. Raise "ValueError" if the signal number is invalid or uncatchable. Raise "RuntimeError" if there is a problem setting up the handler. Use "functools.partial()" to pass keyword arguments to *callback*. Like "signal.signal()", this function must be invoked in the main thread. loop.remove_signal_handler(sig) Remove the handler for the *sig* signal. Return "True" if the signal handler was removed, or "False" if no handler was set for the given signal. Availability: Unix. See also: The "signal" module. Executing code in thread or process pools ----------------------------------------- awaitable loop.run_in_executor(executor, func, *args) Arrange for *func* to be called in the specified executor. The *executor* argument should be an "concurrent.futures.Executor" instance. The default executor is used if *executor* is "None". The default executor can be set by "loop.set_default_executor()", otherwise, a "concurrent.futures.ThreadPoolExecutor" will be lazy- initialized and used by "run_in_executor()" if needed. Example: import asyncio import concurrent.futures def blocking_io(): # File operations (such as logging) can block the # event loop: run them in a thread pool. with open('/dev/urandom', 'rb') as f: return f.read(100) def cpu_bound(): # CPU-bound operations will block the event loop: # in general it is preferable to run them in a # process pool. return sum(i * i for i in range(10 ** 7)) async def main(): loop = asyncio.get_running_loop() ## Options: # 1. Run in the default loop's executor: result = await loop.run_in_executor( None, blocking_io) print('default thread pool', result) # 2. Run in a custom thread pool: with concurrent.futures.ThreadPoolExecutor() as pool: result = await loop.run_in_executor( pool, blocking_io) print('custom thread pool', result) # 3. Run in a custom process pool: with concurrent.futures.ProcessPoolExecutor() as pool: result = await loop.run_in_executor( pool, cpu_bound) print('custom process pool', result) if __name__ == '__main__': asyncio.run(main()) Note that the entry point guard ("if __name__ == '__main__'") is required for option 3 due to the peculiarities of "multiprocessing", which is used by "ProcessPoolExecutor". See Safe importing of main module. This method returns a "asyncio.Future" object. Use "functools.partial()" to pass keyword arguments to *func*. Changed in version 3.5.3: "loop.run_in_executor()" no longer configures the "max_workers" of the thread pool executor it creates, instead leaving it up to the thread pool executor ("ThreadPoolExecutor") to set the default. loop.set_default_executor(executor) Set *executor* as the default executor used by "run_in_executor()". *executor* must be an instance of "ThreadPoolExecutor". Changed in version 3.11: *executor* must be an instance of "ThreadPoolExecutor". Error Handling API ------------------ Allows customizing how exceptions are handled in the event loop. loop.set_exception_handler(handler) Set *handler* as the new event loop exception handler. If *handler* is "None", the default exception handler will be set. Otherwise, *handler* must be a callable with the signature matching "(loop, context)", where "loop" is a reference to the active event loop, and "context" is a "dict" object containing the details of the exception (see "call_exception_handler()" documentation for details about context). If the handler is called on behalf of a "Task" or "Handle", it is run in the "contextvars.Context" of that task or callback handle. Changed in version 3.12: The handler may be called in the "Context" of the task or handle where the exception originated. loop.get_exception_handler() Return the current exception handler, or "None" if no custom exception handler was set. Added in version 3.5.2. loop.default_exception_handler(context) Default exception handler. This is called when an exception occurs and no exception handler is set. This can be called by a custom exception handler that wants to defer to the default handler behavior. *context* parameter has the same meaning as in "call_exception_handler()". loop.call_exception_handler(context) Call the current event loop exception handler. *context* is a "dict" object containing the following keys (new keys may be introduced in future Python versions): * ‘message’: Error message; * ‘exception’ (optional): Exception object; * ‘future’ (optional): "asyncio.Future" instance; * ‘task’ (optional): "asyncio.Task" instance; * ‘handle’ (optional): "asyncio.Handle" instance; * ‘protocol’ (optional): Protocol instance; * ‘transport’ (optional): Transport instance; * ‘socket’ (optional): "socket.socket" instance; * ‘source_traceback’ (optional): Traceback of the source; * ‘handle_traceback’ (optional): Traceback of the handle; * ‘asyncgen’ (optional): Asynchronous generator that caused the exception. Note: This method should not be overloaded in subclassed event loops. For custom exception handling, use the "set_exception_handler()" method. Enabling debug mode ------------------- loop.get_debug() Get the debug mode ("bool") of the event loop. The default value is "True" if the environment variable "PYTHONASYNCIODEBUG" is set to a non-empty string, "False" otherwise. loop.set_debug(enabled: bool) Set the debug mode of the event loop. Changed in version 3.7: The new Python Development Mode can now also be used to enable the debug mode. loop.slow_callback_duration This attribute can be used to set the minimum execution duration in seconds that is considered “slow”. When debug mode is enabled, “slow” callbacks are logged. Default value is 100 milliseconds. See also: The debug mode of asyncio. Running Subprocesses -------------------- Methods described in this subsections are low-level. In regular async/await code consider using the high-level "asyncio.create_subprocess_shell()" and "asyncio.create_subprocess_exec()" convenience functions instead. Note: On Windows, the default event loop "ProactorEventLoop" supports subprocesses, whereas "SelectorEventLoop" does not. See Subprocess Support on Windows for details. async loop.subprocess_exec(protocol_factory, *args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, **kwargs) Create a subprocess from one or more string arguments specified by *args*. *args* must be a list of strings represented by: * "str"; * or "bytes", encoded to the filesystem encoding. The first string specifies the program executable, and the remaining strings specify the arguments. Together, string arguments form the "argv" of the program. This is similar to the standard library "subprocess.Popen" class called with "shell=False" and the list of strings passed as the first argument; however, where "Popen" takes a single argument which is list of strings, *subprocess_exec* takes multiple string arguments. The *protocol_factory* must be a callable returning a subclass of the "asyncio.SubprocessProtocol" class. Other parameters: * *stdin* can be any of these: * a file-like object * an existing file descriptor (a positive integer), for example those created with "os.pipe()" * the "subprocess.PIPE" constant (default) which will create a new pipe and connect it, * the value "None" which will make the subprocess inherit the file descriptor from this process * the "subprocess.DEVNULL" constant which indicates that the special "os.devnull" file will be used * *stdout* can be any of these: * a file-like object * the "subprocess.PIPE" constant (default) which will create a new pipe and connect it, * the value "None" which will make the subprocess inherit the file descriptor from this process * the "subprocess.DEVNULL" constant which indicates that the special "os.devnull" file will be used * *stderr* can be any of these: * a file-like object * the "subprocess.PIPE" constant (default) which will create a new pipe and connect it, * the value "None" which will make the subprocess inherit the file descriptor from this process * the "subprocess.DEVNULL" constant which indicates that the special "os.devnull" file will be used * the "subprocess.STDOUT" constant which will connect the standard error stream to the process’ standard output stream * All other keyword arguments are passed to "subprocess.Popen" without interpretation, except for *bufsize*, *universal_newlines*, *shell*, *text*, *encoding* and *errors*, which should not be specified at all. The "asyncio" subprocess API does not support decoding the streams as text. "bytes.decode()" can be used to convert the bytes returned from the stream to text. If a file-like object passed as *stdin*, *stdout* or *stderr* represents a pipe, then the other side of this pipe should be registered with "connect_write_pipe()" or "connect_read_pipe()" for use with the event loop. See the constructor of the "subprocess.Popen" class for documentation on other arguments. Returns a pair of "(transport, protocol)", where *transport* conforms to the "asyncio.SubprocessTransport" base class and *protocol* is an object instantiated by the *protocol_factory*. async loop.subprocess_shell(protocol_factory, cmd, *, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, **kwargs) Create a subprocess from *cmd*, which can be a "str" or a "bytes" string encoded to the filesystem encoding, using the platform’s “shell” syntax. This is similar to the standard library "subprocess.Popen" class called with "shell=True". The *protocol_factory* must be a callable returning a subclass of the "SubprocessProtocol" class. See "subprocess_exec()" for more details about the remaining arguments. Returns a pair of "(transport, protocol)", where *transport* conforms to the "SubprocessTransport" base class and *protocol* is an object instantiated by the *protocol_factory*. Note: It is the application’s responsibility to ensure that all whitespace and special characters are quoted appropriately to avoid shell injection vulnerabilities. The "shlex.quote()" function can be used to properly escape whitespace and special characters in strings that are going to be used to construct shell commands. Callback Handles ================ class asyncio.Handle A callback wrapper object returned by "loop.call_soon()", "loop.call_soon_threadsafe()". get_context() Return the "contextvars.Context" object associated with the handle. Added in version 3.12. cancel() Cancel the callback. If the callback has already been canceled or executed, this method has no effect. cancelled() Return "True" if the callback was cancelled. Added in version 3.7. class asyncio.TimerHandle A callback wrapper object returned by "loop.call_later()", and "loop.call_at()". This class is a subclass of "Handle". when() Return a scheduled callback time as "float" seconds. The time is an absolute timestamp, using the same time reference as "loop.time()". Added in version 3.7. Server Objects ============== Server objects are created by "loop.create_server()", "loop.create_unix_server()", "start_server()", and "start_unix_server()" functions. Do not instantiate the "Server" class directly. class asyncio.Server *Server* objects are asynchronous context managers. When used in an "async with" statement, it’s guaranteed that the Server object is closed and not accepting new connections when the "async with" statement is completed: srv = await loop.create_server(...) async with srv: # some code # At this point, srv is closed and no longer accepts new connections. Changed in version 3.7: Server object is an asynchronous context manager since Python 3.7. Changed in version 3.11: This class was exposed publicly as "asyncio.Server" in Python 3.9.11, 3.10.3 and 3.11. close() Stop serving: close listening sockets and set the "sockets" attribute to "None". The sockets that represent existing incoming client connections are left open. The server is closed asynchronously; use the "wait_closed()" coroutine to wait until the server is closed (and no more connections are active). close_clients() Close all existing incoming client connections. Calls "close()" on all associated transports. "close()" should be called before "close_clients()" when closing the server to avoid races with new clients connecting. Added in version 3.13. abort_clients() Close all existing incoming client connections immediately, without waiting for pending operations to complete. Calls "abort()" on all associated transports. "close()" should be called before "abort_clients()" when closing the server to avoid races with new clients connecting. Added in version 3.13. get_loop() Return the event loop associated with the server object. Added in version 3.7. async start_serving() Start accepting connections. This method is idempotent, so it can be called when the server is already serving. The *start_serving* keyword-only parameter to "loop.create_server()" and "asyncio.start_server()" allows creating a Server object that is not accepting connections initially. In this case "Server.start_serving()", or "Server.serve_forever()" can be used to make the Server start accepting connections. Added in version 3.7. async serve_forever() Start accepting connections until the coroutine is cancelled. Cancellation of "serve_forever" task causes the server to be closed. This method can be called if the server is already accepting connections. Only one "serve_forever" task can exist per one *Server* object. Example: async def client_connected(reader, writer): # Communicate with the client with # reader/writer streams. For example: await reader.readline() async def main(host, port): srv = await asyncio.start_server( client_connected, host, port) await srv.serve_forever() asyncio.run(main('127.0.0.1', 0)) Added in version 3.7. is_serving() Return "True" if the server is accepting new connections. Added in version 3.7. async wait_closed() Wait until the "close()" method completes and all active connections have finished. sockets List of socket-like objects, "asyncio.trsock.TransportSocket", which the server is listening on. Changed in version 3.7: Prior to Python 3.7 "Server.sockets" used to return an internal list of server sockets directly. In 3.7 a copy of that list is returned. Event Loop Implementations ========================== asyncio ships with two different event loop implementations: "SelectorEventLoop" and "ProactorEventLoop". By default asyncio is configured to use "EventLoop". class asyncio.SelectorEventLoop A subclass of "AbstractEventLoop" based on the "selectors" module. Uses the most efficient *selector* available for the given platform. It is also possible to manually configure the exact selector implementation to be used: import asyncio import selectors class MyPolicy(asyncio.DefaultEventLoopPolicy): def new_event_loop(self): selector = selectors.SelectSelector() return asyncio.SelectorEventLoop(selector) asyncio.set_event_loop_policy(MyPolicy()) Availability: Unix, Windows. class asyncio.ProactorEventLoop A subclass of "AbstractEventLoop" for Windows that uses “I/O Completion Ports” (IOCP). Availability: Windows. See also: MSDN documentation on I/O Completion Ports. class asyncio.EventLoop An alias to the most efficient available subclass of "AbstractEventLoop" for the given platform. It is an alias to "SelectorEventLoop" on Unix and "ProactorEventLoop" on Windows. Added in version 3.13. class asyncio.AbstractEventLoop Abstract base class for asyncio-compliant event loops. The Event Loop Methods section lists all methods that an alternative implementation of "AbstractEventLoop" should have defined. Examples ======== Note that all examples in this section **purposefully** show how to use the low-level event loop APIs, such as "loop.run_forever()" and "loop.call_soon()". Modern asyncio applications rarely need to be written this way; consider using the high-level functions like "asyncio.run()". Hello World with call_soon() ---------------------------- An example using the "loop.call_soon()" method to schedule a callback. The callback displays ""Hello World"" and then stops the event loop: import asyncio def hello_world(loop): """A callback to print 'Hello World' and stop the event loop""" print('Hello World') loop.stop() loop = asyncio.new_event_loop() # Schedule a call to hello_world() loop.call_soon(hello_world, loop) # Blocking call interrupted by loop.stop() try: loop.run_forever() finally: loop.close() See also: A similar Hello World example created with a coroutine and the "run()" function. Display the current date with call_later() ------------------------------------------ An example of a callback displaying the current date every second. The callback uses the "loop.call_later()" method to reschedule itself after 5 seconds, and then stops the event loop: import asyncio import datetime def display_date(end_time, loop): print(datetime.datetime.now()) if (loop.time() + 1.0) < end_time: loop.call_later(1, display_date, end_time, loop) else: loop.stop() loop = asyncio.new_event_loop() # Schedule the first call to display_date() end_time = loop.time() + 5.0 loop.call_soon(display_date, end_time, loop) # Blocking call interrupted by loop.stop() try: loop.run_forever() finally: loop.close() See also: A similar current date example created with a coroutine and the "run()" function. Watch a file descriptor for read events --------------------------------------- Wait until a file descriptor received some data using the "loop.add_reader()" method and then close the event loop: import asyncio from socket import socketpair # Create a pair of connected file descriptors rsock, wsock = socketpair() loop = asyncio.new_event_loop() def reader(): data = rsock.recv(100) print("Received:", data.decode()) # We are done: unregister the file descriptor loop.remove_reader(rsock) # Stop the event loop loop.stop() # Register the file descriptor for read event loop.add_reader(rsock, reader) # Simulate the reception of data from the network loop.call_soon(wsock.send, 'abc'.encode()) try: # Run the event loop loop.run_forever() finally: # We are done. Close sockets and the event loop. rsock.close() wsock.close() loop.close() See also: * A similar example using transports, protocols, and the "loop.create_connection()" method. * Another similar example using the high-level "asyncio.open_connection()" function and streams. Set signal handlers for SIGINT and SIGTERM ------------------------------------------ (This "signals" example only works on Unix.) Register handlers for signals "SIGINT" and "SIGTERM" using the "loop.add_signal_handler()" method: import asyncio import functools import os import signal def ask_exit(signame, loop): print("got signal %s: exit" % signame) loop.stop() async def main(): loop = asyncio.get_running_loop() for signame in {'SIGINT', 'SIGTERM'}: loop.add_signal_handler( getattr(signal, signame), functools.partial(ask_exit, signame, loop)) await asyncio.sleep(3600) print("Event loop running for 1 hour, press Ctrl+C to interrupt.") print(f"pid {os.getpid()}: send SIGINT or SIGTERM to exit.") asyncio.run(main()) Exceptions ********** **Source code:** Lib/asyncio/exceptions.py ====================================================================== exception asyncio.TimeoutError A deprecated alias of "TimeoutError", raised when the operation has exceeded the given deadline. Changed in version 3.11: This class was made an alias of "TimeoutError". exception asyncio.CancelledError The operation has been cancelled. This exception can be caught to perform custom operations when asyncio Tasks are cancelled. In almost all situations the exception must be re-raised. Changed in version 3.8: "CancelledError" is now a subclass of "BaseException" rather than "Exception". exception asyncio.InvalidStateError Invalid internal state of "Task" or "Future". Can be raised in situations like setting a result value for a *Future* object that already has a result value set. exception asyncio.SendfileNotAvailableError The “sendfile” syscall is not available for the given socket or file type. A subclass of "RuntimeError". exception asyncio.IncompleteReadError The requested read operation did not complete fully. Raised by the asyncio stream APIs. This exception is a subclass of "EOFError". expected The total number ("int") of expected bytes. partial A string of "bytes" read before the end of stream was reached. exception asyncio.LimitOverrunError Reached the buffer size limit while looking for a separator. Raised by the asyncio stream APIs. consumed The total number of to be consumed bytes. Extending ********* The main direction for "asyncio" extending is writing custom *event loop* classes. Asyncio has helpers that could be used to simplify this task. Note: Third-parties should reuse existing asyncio code with caution, a new Python version is free to break backward compatibility in *internal* part of API. Writing a Custom Event Loop =========================== "asyncio.AbstractEventLoop" declares very many methods. Implementing all them from scratch is a tedious job. A loop can get many common methods implementation for free by inheriting from "asyncio.BaseEventLoop". In turn, the successor should implement a bunch of *private* methods declared but not implemented in "asyncio.BaseEventLoop". For example, "loop.create_connection()" checks arguments, resolves DNS addresses, and calls "loop._make_socket_transport()" that should be implemented by inherited class. The "_make_socket_transport()" method is not documented and is considered as an *internal* API. Future and Task private constructors ==================================== "asyncio.Future" and "asyncio.Task" should be never created directly, please use corresponding "loop.create_future()" and "loop.create_task()", or "asyncio.create_task()" factories instead. However, third-party *event loops* may *reuse* built-in future and task implementations for the sake of getting a complex and highly optimized code for free. For this purpose the following, *private* constructors are listed: Future.__init__(*, loop=None) Create a built-in future instance. *loop* is an optional event loop instance. Task.__init__(coro, *, loop=None, name=None, context=None) Create a built-in task instance. *loop* is an optional event loop instance. The rest of arguments are described in "loop.create_task()" description. Changed in version 3.11: *context* argument is added. Task lifetime support ===================== A third party task implementation should call the following functions to keep a task visible by "asyncio.all_tasks()" and "asyncio.current_task()": asyncio._register_task(task) Register a new *task* as managed by *asyncio*. Call the function from a task constructor. asyncio._unregister_task(task) Unregister a *task* from *asyncio* internal structures. The function should be called when a task is about to finish. asyncio._enter_task(loop, task) Switch the current task to the *task* argument. Call the function just before executing a portion of embedded *coroutine* ("coroutine.send()" or "coroutine.throw()"). asyncio._leave_task(loop, task) Switch the current task back from *task* to "None". Call the function just after "coroutine.send()" or "coroutine.throw()" execution. Futures ******* **Source code:** Lib/asyncio/futures.py, Lib/asyncio/base_futures.py ====================================================================== *Future* objects are used to bridge **low-level callback-based code** with high-level async/await code. Future Functions ================ asyncio.isfuture(obj) Return "True" if *obj* is either of: * an instance of "asyncio.Future", * an instance of "asyncio.Task", * a Future-like object with a "_asyncio_future_blocking" attribute. Added in version 3.5. asyncio.ensure_future(obj, *, loop=None) Return: * *obj* argument as is, if *obj* is a "Future", a "Task", or a Future-like object ("isfuture()" is used for the test.) * a "Task" object wrapping *obj*, if *obj* is a coroutine ("iscoroutine()" is used for the test); in this case the coroutine will be scheduled by "ensure_future()". * a "Task" object that would await on *obj*, if *obj* is an awaitable ("inspect.isawaitable()" is used for the test.) If *obj* is neither of the above a "TypeError" is raised. Important: See also the "create_task()" function which is the preferred way for creating new Tasks.Save a reference to the result of this function, to avoid a task disappearing mid-execution. Changed in version 3.5.1: The function accepts any *awaitable* object. Deprecated since version 3.10: Deprecation warning is emitted if *obj* is not a Future-like object and *loop* is not specified and there is no running event loop. asyncio.wrap_future(future, *, loop=None) Wrap a "concurrent.futures.Future" object in a "asyncio.Future" object. Deprecated since version 3.10: Deprecation warning is emitted if *future* is not a Future-like object and *loop* is not specified and there is no running event loop. Future Object ============= class asyncio.Future(*, loop=None) A Future represents an eventual result of an asynchronous operation. Not thread-safe. Future is an *awaitable* object. Coroutines can await on Future objects until they either have a result or an exception set, or until they are cancelled. A Future can be awaited multiple times and the result is same. Typically Futures are used to enable low-level callback-based code (e.g. in protocols implemented using asyncio transports) to interoperate with high-level async/await code. The rule of thumb is to never expose Future objects in user-facing APIs, and the recommended way to create a Future object is to call "loop.create_future()". This way alternative event loop implementations can inject their own optimized implementations of a Future object. Changed in version 3.7: Added support for the "contextvars" module. Deprecated since version 3.10: Deprecation warning is emitted if *loop* is not specified and there is no running event loop. result() Return the result of the Future. If the Future is *done* and has a result set by the "set_result()" method, the result value is returned. If the Future is *done* and has an exception set by the "set_exception()" method, this method raises the exception. If the Future has been *cancelled*, this method raises a "CancelledError" exception. If the Future’s result isn’t yet available, this method raises an "InvalidStateError" exception. set_result(result) Mark the Future as *done* and set its result. Raises an "InvalidStateError" error if the Future is already *done*. set_exception(exception) Mark the Future as *done* and set an exception. Raises an "InvalidStateError" error if the Future is already *done*. done() Return "True" if the Future is *done*. A Future is *done* if it was *cancelled* or if it has a result or an exception set with "set_result()" or "set_exception()" calls. cancelled() Return "True" if the Future was *cancelled*. The method is usually used to check if a Future is not *cancelled* before setting a result or an exception for it: if not fut.cancelled(): fut.set_result(42) add_done_callback(callback, *, context=None) Add a callback to be run when the Future is *done*. The *callback* is called with the Future object as its only argument. If the Future is already *done* when this method is called, the callback is scheduled with "loop.call_soon()". An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *callback* to run in. The current context is used when no *context* is provided. "functools.partial()" can be used to pass parameters to the callback, e.g.: # Call 'print("Future:", fut)' when "fut" is done. fut.add_done_callback( functools.partial(print, "Future:")) Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. remove_done_callback(callback) Remove *callback* from the callbacks list. Returns the number of callbacks removed, which is typically 1, unless a callback was added more than once. cancel(msg=None) Cancel the Future and schedule callbacks. If the Future is already *done* or *cancelled*, return "False". Otherwise, change the Future’s state to *cancelled*, schedule the callbacks, and return "True". Changed in version 3.9: Added the *msg* parameter. exception() Return the exception that was set on this Future. The exception (or "None" if no exception was set) is returned only if the Future is *done*. If the Future has been *cancelled*, this method raises a "CancelledError" exception. If the Future isn’t *done* yet, this method raises an "InvalidStateError" exception. get_loop() Return the event loop the Future object is bound to. Added in version 3.7. This example creates a Future object, creates and schedules an asynchronous Task to set result for the Future, and waits until the Future has a result: async def set_after(fut, delay, value): # Sleep for *delay* seconds. await asyncio.sleep(delay) # Set *value* as a result of *fut* Future. fut.set_result(value) async def main(): # Get the current event loop. loop = asyncio.get_running_loop() # Create a new Future object. fut = loop.create_future() # Run "set_after()" coroutine in a parallel Task. # We are using the low-level "loop.create_task()" API here because # we already have a reference to the event loop at hand. # Otherwise we could have just used "asyncio.create_task()". loop.create_task( set_after(fut, 1, '... world')) print('hello ...') # Wait until *fut* has a result (1 second) and print it. print(await fut) asyncio.run(main()) Important: The Future object was designed to mimic "concurrent.futures.Future". Key differences include: * unlike asyncio Futures, "concurrent.futures.Future" instances cannot be awaited. * "asyncio.Future.result()" and "asyncio.Future.exception()" do not accept the *timeout* argument. * "asyncio.Future.result()" and "asyncio.Future.exception()" raise an "InvalidStateError" exception when the Future is not *done*. * Callbacks registered with "asyncio.Future.add_done_callback()" are not called immediately. They are scheduled with "loop.call_soon()" instead. * asyncio Future is not compatible with the "concurrent.futures.wait()" and "concurrent.futures.as_completed()" functions. * "asyncio.Future.cancel()" accepts an optional "msg" argument, but "concurrent.futures.Future.cancel()" does not. Low-level API Index ******************* This page lists all low-level asyncio APIs. Obtaining the Event Loop ======================== +----------------------------------------------------+----------------------------------------------------+ | "asyncio.get_running_loop()" | The **preferred** function to get the running | | | event loop. | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.get_event_loop()" | Get an event loop instance (running or current via | | | the current policy). | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.set_event_loop()" | Set the event loop as current via the current | | | policy. | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.new_event_loop()" | Create a new event loop. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.get_running_loop(). Event Loop Methods ================== See also the main documentation section about the Event Loop Methods. -[ Lifecycle ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.run_until_complete()" | Run a Future/Task/awaitable until complete. | +----------------------------------------------------+----------------------------------------------------+ | "loop.run_forever()" | Run the event loop forever. | +----------------------------------------------------+----------------------------------------------------+ | "loop.stop()" | Stop the event loop. | +----------------------------------------------------+----------------------------------------------------+ | "loop.close()" | Close the event loop. | +----------------------------------------------------+----------------------------------------------------+ | "loop.is_running()" | Return "True" if the event loop is running. | +----------------------------------------------------+----------------------------------------------------+ | "loop.is_closed()" | Return "True" if the event loop is closed. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.shutdown_asyncgens()" | Close asynchronous generators. | +----------------------------------------------------+----------------------------------------------------+ -[ Debugging ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.set_debug()" | Enable or disable the debug mode. | +----------------------------------------------------+----------------------------------------------------+ | "loop.get_debug()" | Get the current debug mode. | +----------------------------------------------------+----------------------------------------------------+ -[ Scheduling Callbacks ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.call_soon()" | Invoke a callback soon. | +----------------------------------------------------+----------------------------------------------------+ | "loop.call_soon_threadsafe()" | A thread-safe variant of "loop.call_soon()". | +----------------------------------------------------+----------------------------------------------------+ | "loop.call_later()" | Invoke a callback *after* the given time. | +----------------------------------------------------+----------------------------------------------------+ | "loop.call_at()" | Invoke a callback *at* the given time. | +----------------------------------------------------+----------------------------------------------------+ -[ Thread/Process Pool ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.run_in_executor()" | Run a CPU-bound or other blocking function in a | | | "concurrent.futures" executor. | +----------------------------------------------------+----------------------------------------------------+ | "loop.set_default_executor()" | Set the default executor for | | | "loop.run_in_executor()". | +----------------------------------------------------+----------------------------------------------------+ -[ Tasks and Futures ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.create_future()" | Create a "Future" object. | +----------------------------------------------------+----------------------------------------------------+ | "loop.create_task()" | Schedule coroutine as a "Task". | +----------------------------------------------------+----------------------------------------------------+ | "loop.set_task_factory()" | Set a factory used by "loop.create_task()" to | | | create "Tasks". | +----------------------------------------------------+----------------------------------------------------+ | "loop.get_task_factory()" | Get the factory "loop.create_task()" uses to | | | create "Tasks". | +----------------------------------------------------+----------------------------------------------------+ -[ DNS ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.getaddrinfo()" | Asynchronous version of "socket.getaddrinfo()". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.getnameinfo()" | Asynchronous version of "socket.getnameinfo()". | +----------------------------------------------------+----------------------------------------------------+ -[ Networking and IPC ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_connection()" | Open a TCP connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_server()" | Create a TCP server. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_unix_connection()" | Open a Unix socket connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_unix_server()" | Create a Unix socket server. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.connect_accepted_socket()" | Wrap a "socket" into a "(transport, protocol)" | | | pair. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_datagram_endpoint()" | Open a datagram (UDP) connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sendfile()" | Send a file over a transport. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.start_tls()" | Upgrade an existing connection to TLS. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.connect_read_pipe()" | Wrap a read end of a pipe into a "(transport, | | | protocol)" pair. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.connect_write_pipe()" | Wrap a write end of a pipe into a "(transport, | | | protocol)" pair. | +----------------------------------------------------+----------------------------------------------------+ -[ Sockets ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_recv()" | Receive data from the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_recv_into()" | Receive data from the "socket" into a buffer. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_recvfrom()" | Receive a datagram from the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_recvfrom_into()" | Receive a datagram from the "socket" into a | | | buffer. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_sendall()" | Send data to the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_sendto()" | Send a datagram via the "socket" to the given | | | address. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_connect()" | Connect the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_accept()" | Accept a "socket" connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_sendfile()" | Send a file over the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "loop.add_reader()" | Start watching a file descriptor for read | | | availability. | +----------------------------------------------------+----------------------------------------------------+ | "loop.remove_reader()" | Stop watching a file descriptor for read | | | availability. | +----------------------------------------------------+----------------------------------------------------+ | "loop.add_writer()" | Start watching a file descriptor for write | | | availability. | +----------------------------------------------------+----------------------------------------------------+ | "loop.remove_writer()" | Stop watching a file descriptor for write | | | availability. | +----------------------------------------------------+----------------------------------------------------+ -[ Unix Signals ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.add_signal_handler()" | Add a handler for a "signal". | +----------------------------------------------------+----------------------------------------------------+ | "loop.remove_signal_handler()" | Remove a handler for a "signal". | +----------------------------------------------------+----------------------------------------------------+ -[ Subprocesses ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.subprocess_exec()" | Spawn a subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "loop.subprocess_shell()" | Spawn a subprocess from a shell command. | +----------------------------------------------------+----------------------------------------------------+ -[ Error Handling ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.call_exception_handler()" | Call the exception handler. | +----------------------------------------------------+----------------------------------------------------+ | "loop.set_exception_handler()" | Set a new exception handler. | +----------------------------------------------------+----------------------------------------------------+ | "loop.get_exception_handler()" | Get the current exception handler. | +----------------------------------------------------+----------------------------------------------------+ | "loop.default_exception_handler()" | The default exception handler implementation. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.new_event_loop() and loop.run_forever(). * Using loop.call_later(). * Using "loop.create_connection()" to implement an echo-client. * Using "loop.create_connection()" to connect a socket. * Using add_reader() to watch an FD for read events. * Using loop.add_signal_handler(). * Using loop.subprocess_exec(). Transports ========== All transports implement the following methods: +----------------------------------------------------+----------------------------------------------------+ | "transport.close()" | Close the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.is_closing()" | Return "True" if the transport is closing or is | | | closed. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_extra_info()" | Request for information about the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.set_protocol()" | Set a new protocol. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_protocol()" | Return the current protocol. | +----------------------------------------------------+----------------------------------------------------+ Transports that can receive data (TCP and Unix connections, pipes, etc). Returned from methods like "loop.create_connection()", "loop.create_unix_connection()", "loop.connect_read_pipe()", etc: -[ Read Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.is_reading()" | Return "True" if the transport is receiving. | +----------------------------------------------------+----------------------------------------------------+ | "transport.pause_reading()" | Pause receiving. | +----------------------------------------------------+----------------------------------------------------+ | "transport.resume_reading()" | Resume receiving. | +----------------------------------------------------+----------------------------------------------------+ Transports that can Send data (TCP and Unix connections, pipes, etc). Returned from methods like "loop.create_connection()", "loop.create_unix_connection()", "loop.connect_write_pipe()", etc: -[ Write Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.write()" | Write data to the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.writelines()" | Write buffers to the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.can_write_eof()" | Return "True" if the transport supports sending | | | EOF. | +----------------------------------------------------+----------------------------------------------------+ | "transport.write_eof()" | Close and send EOF after flushing buffered data. | +----------------------------------------------------+----------------------------------------------------+ | "transport.abort()" | Close the transport immediately. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_write_buffer_size()" | Return the current size of the output buffer. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_write_buffer_limits()" | Return high and low water marks for write flow | | | control. | +----------------------------------------------------+----------------------------------------------------+ | "transport.set_write_buffer_limits()" | Set new high and low water marks for write flow | | | control. | +----------------------------------------------------+----------------------------------------------------+ Transports returned by "loop.create_datagram_endpoint()": -[ Datagram Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.sendto()" | Send data to the remote peer. | +----------------------------------------------------+----------------------------------------------------+ | "transport.abort()" | Close the transport immediately. | +----------------------------------------------------+----------------------------------------------------+ Low-level transport abstraction over subprocesses. Returned by "loop.subprocess_exec()" and "loop.subprocess_shell()": -[ Subprocess Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.get_pid()" | Return the subprocess process id. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_pipe_transport()" | Return the transport for the requested | | | communication pipe (*stdin*, *stdout*, or | | | *stderr*). | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_returncode()" | Return the subprocess return code. | +----------------------------------------------------+----------------------------------------------------+ | "transport.kill()" | Kill the subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "transport.send_signal()" | Send a signal to the subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "transport.terminate()" | Stop the subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "transport.close()" | Kill the subprocess and close all pipes. | +----------------------------------------------------+----------------------------------------------------+ Protocols ========= Protocol classes can implement the following **callback methods**: +----------------------------------------------------+----------------------------------------------------+ | "callback" "connection_made()" | Called when a connection is made. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "connection_lost()" | Called when the connection is lost or closed. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "pause_writing()" | Called when the transport’s buffer goes over the | | | high water mark. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "resume_writing()" | Called when the transport’s buffer drains below | | | the low water mark. | +----------------------------------------------------+----------------------------------------------------+ -[ Streaming Protocols (TCP, Unix Sockets, Pipes) ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "data_received()" | Called when some data is received. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "eof_received()" | Called when an EOF is received. | +----------------------------------------------------+----------------------------------------------------+ -[ Buffered Streaming Protocols ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "get_buffer()" | Called to allocate a new receive buffer. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "buffer_updated()" | Called when the buffer was updated with the | | | received data. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "eof_received()" | Called when an EOF is received. | +----------------------------------------------------+----------------------------------------------------+ -[ Datagram Protocols ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "datagram_received()" | Called when a datagram is received. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "error_received()" | Called when a previous send or receive operation | | | raises an "OSError". | +----------------------------------------------------+----------------------------------------------------+ -[ Subprocess Protocols ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "pipe_data_received()" | Called when the child process writes data into its | | | *stdout* or *stderr* pipe. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "pipe_connection_lost()" | Called when one of the pipes communicating with | | | the child process is closed. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "process_exited()" | Called when the child process has exited. It can | | | be called before "pipe_data_received()" and | | | "pipe_connection_lost()" methods. | +----------------------------------------------------+----------------------------------------------------+ Event Loop Policies =================== Policies is a low-level mechanism to alter the behavior of functions like "asyncio.get_event_loop()". See also the main policies section for more details. -[ Accessing Policies ]- +----------------------------------------------------+----------------------------------------------------+ | "asyncio.get_event_loop_policy()" | Return the current process-wide policy. | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.set_event_loop_policy()" | Set a new process-wide policy. | +----------------------------------------------------+----------------------------------------------------+ | "AbstractEventLoopPolicy" | Base class for policy objects. | +----------------------------------------------------+----------------------------------------------------+ Platform Support **************** The "asyncio" module is designed to be portable, but some platforms have subtle differences and limitations due to the platforms’ underlying architecture and capabilities. All Platforms ============= * "loop.add_reader()" and "loop.add_writer()" cannot be used to monitor file I/O. Windows ======= **Source code:** Lib/asyncio/proactor_events.py, Lib/asyncio/windows_events.py, Lib/asyncio/windows_utils.py ====================================================================== Changed in version 3.8: On Windows, "ProactorEventLoop" is now the default event loop. All event loops on Windows do not support the following methods: * "loop.create_unix_connection()" and "loop.create_unix_server()" are not supported. The "socket.AF_UNIX" socket family is specific to Unix. * "loop.add_signal_handler()" and "loop.remove_signal_handler()" are not supported. "SelectorEventLoop" has the following limitations: * "SelectSelector" is used to wait on socket events: it supports sockets and is limited to 512 sockets. * "loop.add_reader()" and "loop.add_writer()" only accept socket handles (e.g. pipe file descriptors are not supported). * Pipes are not supported, so the "loop.connect_read_pipe()" and "loop.connect_write_pipe()" methods are not implemented. * Subprocesses are not supported, i.e. "loop.subprocess_exec()" and "loop.subprocess_shell()" methods are not implemented. "ProactorEventLoop" has the following limitations: * The "loop.add_reader()" and "loop.add_writer()" methods are not supported. The resolution of the monotonic clock on Windows is usually around 15.6 milliseconds. The best resolution is 0.5 milliseconds. The resolution depends on the hardware (availability of HPET) and on the Windows configuration. Subprocess Support on Windows ----------------------------- On Windows, the default event loop "ProactorEventLoop" supports subprocesses, whereas "SelectorEventLoop" does not. The "policy.set_child_watcher()" function is also not supported, as "ProactorEventLoop" has a different mechanism to watch child processes. macOS ===== Modern macOS versions are fully supported. -[ macOS <= 10.8 ]- On macOS 10.6, 10.7 and 10.8, the default event loop uses "selectors.KqueueSelector", which does not support character devices on these versions. The "SelectorEventLoop" can be manually configured to use "SelectSelector" or "PollSelector" to support character devices on these older versions of macOS. Example: import asyncio import selectors selector = selectors.SelectSelector() loop = asyncio.SelectorEventLoop(selector) asyncio.set_event_loop(loop) Policies ******** An event loop policy is a global object used to get and set the current event loop, as well as create new event loops. The default policy can be replaced with built-in alternatives to use different event loop implementations, or substituted by a custom policy that can override these behaviors. The policy object gets and sets a separate event loop per *context*. This is per-thread by default, though custom policies could define *context* differently. Custom event loop policies can control the behavior of "get_event_loop()", "set_event_loop()", and "new_event_loop()". Policy objects should implement the APIs defined in the "AbstractEventLoopPolicy" abstract base class. Getting and Setting the Policy ============================== The following functions can be used to get and set the policy for the current process: asyncio.get_event_loop_policy() Return the current process-wide policy. asyncio.set_event_loop_policy(policy) Set the current process-wide policy to *policy*. If *policy* is set to "None", the default policy is restored. Policy Objects ============== The abstract event loop policy base class is defined as follows: class asyncio.AbstractEventLoopPolicy An abstract base class for asyncio policies. get_event_loop() Get the event loop for the current context. Return an event loop object implementing the "AbstractEventLoop" interface. This method should never return "None". Changed in version 3.6. set_event_loop(loop) Set the event loop for the current context to *loop*. new_event_loop() Create and return a new event loop object. This method should never return "None". get_child_watcher() Get a child process watcher object. Return a watcher object implementing the "AbstractChildWatcher" interface. This function is Unix specific. Deprecated since version 3.12. set_child_watcher(watcher) Set the current child process watcher to *watcher*. This function is Unix specific. Deprecated since version 3.12. asyncio ships with the following built-in policies: class asyncio.DefaultEventLoopPolicy The default asyncio policy. Uses "SelectorEventLoop" on Unix and "ProactorEventLoop" on Windows. There is no need to install the default policy manually. asyncio is configured to use the default policy automatically. Changed in version 3.8: On Windows, "ProactorEventLoop" is now used by default. Deprecated since version 3.12: The "get_event_loop()" method of the default asyncio policy now emits a "DeprecationWarning" if there is no current event loop set and it decides to create one. In some future Python release this will become an error. class asyncio.WindowsSelectorEventLoopPolicy An alternative event loop policy that uses the "SelectorEventLoop" event loop implementation. Availability: Windows. class asyncio.WindowsProactorEventLoopPolicy An alternative event loop policy that uses the "ProactorEventLoop" event loop implementation. Availability: Windows. Process Watchers ================ A process watcher allows customization of how an event loop monitors child processes on Unix. Specifically, the event loop needs to know when a child process has exited. In asyncio, child processes are created with "create_subprocess_exec()" and "loop.subprocess_exec()" functions. asyncio defines the "AbstractChildWatcher" abstract base class, which child watchers should implement, and has four different implementations: "ThreadedChildWatcher" (configured to be used by default), "MultiLoopChildWatcher", "SafeChildWatcher", and "FastChildWatcher". See also the Subprocess and Threads section. The following two functions can be used to customize the child process watcher implementation used by the asyncio event loop: asyncio.get_child_watcher() Return the current child watcher for the current policy. Deprecated since version 3.12. asyncio.set_child_watcher(watcher) Set the current child watcher to *watcher* for the current policy. *watcher* must implement methods defined in the "AbstractChildWatcher" base class. Deprecated since version 3.12. Note: Third-party event loops implementations might not support custom child watchers. For such event loops, using "set_child_watcher()" might be prohibited or have no effect. class asyncio.AbstractChildWatcher add_child_handler(pid, callback, *args) Register a new child handler. Arrange for "callback(pid, returncode, *args)" to be called when a process with PID equal to *pid* terminates. Specifying another callback for the same process replaces the previous handler. The *callback* callable must be thread-safe. remove_child_handler(pid) Removes the handler for process with PID equal to *pid*. The function returns "True" if the handler was successfully removed, "False" if there was nothing to remove. attach_loop(loop) Attach the watcher to an event loop. If the watcher was previously attached to an event loop, then it is first detached before attaching to the new loop. Note: loop may be "None". is_active() Return "True" if the watcher is ready to use. Spawning a subprocess with *inactive* current child watcher raises "RuntimeError". Added in version 3.8. close() Close the watcher. This method has to be called to ensure that underlying resources are cleaned-up. Deprecated since version 3.12. class asyncio.ThreadedChildWatcher This implementation starts a new waiting thread for every subprocess spawn. It works reliably even when the asyncio event loop is run in a non- main OS thread. There is no noticeable overhead when handling a big number of children (*O*(1) each time a child terminates), but starting a thread per process requires extra memory. This watcher is used by default. Added in version 3.8. class asyncio.MultiLoopChildWatcher This implementation registers a "SIGCHLD" signal handler on instantiation. That can break third-party code that installs a custom handler for "SIGCHLD" signal. The watcher avoids disrupting other code spawning processes by polling every process explicitly on a "SIGCHLD" signal. There is no limitation for running subprocesses from different threads once the watcher is installed. The solution is safe but it has a significant overhead when handling a big number of processes (*O*(*n*) each time a "SIGCHLD" is received). Added in version 3.8. Deprecated since version 3.12. class asyncio.SafeChildWatcher This implementation uses active event loop from the main thread to handle "SIGCHLD" signal. If the main thread has no running event loop another thread cannot spawn a subprocess ("RuntimeError" is raised). The watcher avoids disrupting other code spawning processes by polling every process explicitly on a "SIGCHLD" signal. This solution is as safe as "MultiLoopChildWatcher" and has the same *O*(*n*) complexity but requires a running event loop in the main thread to work. Deprecated since version 3.12. class asyncio.FastChildWatcher This implementation reaps every terminated processes by calling "os.waitpid(-1)" directly, possibly breaking other code spawning processes and waiting for their termination. There is no noticeable overhead when handling a big number of children (*O*(1) each time a child terminates). This solution requires a running event loop in the main thread to work, as "SafeChildWatcher". Deprecated since version 3.12. class asyncio.PidfdChildWatcher This implementation polls process file descriptors (pidfds) to await child process termination. In some respects, "PidfdChildWatcher" is a “Goldilocks” child watcher implementation. It doesn’t require signals or threads, doesn’t interfere with any processes launched outside the event loop, and scales linearly with the number of subprocesses launched by the event loop. The main disadvantage is that pidfds are specific to Linux, and only work on recent (5.3+) kernels. Added in version 3.9. Custom Policies =============== To implement a new event loop policy, it is recommended to subclass "DefaultEventLoopPolicy" and override the methods for which custom behavior is wanted, e.g.: class MyEventLoopPolicy(asyncio.DefaultEventLoopPolicy): def get_event_loop(self): """Get the event loop. This may be None or an instance of EventLoop. """ loop = super().get_event_loop() # Do something with loop ... return loop asyncio.set_event_loop_policy(MyEventLoopPolicy()) Transports and Protocols ************************ -[ Preface ]- Transports and Protocols are used by the **low-level** event loop APIs such as "loop.create_connection()". They use callback-based programming style and enable high-performance implementations of network or IPC protocols (e.g. HTTP). Essentially, transports and protocols should only be used in libraries and frameworks and never in high-level asyncio applications. This documentation page covers both Transports and Protocols. -[ Introduction ]- At the highest level, the transport is concerned with *how* bytes are transmitted, while the protocol determines *which* bytes to transmit (and to some extent when). A different way of saying the same thing: a transport is an abstraction for a socket (or similar I/O endpoint) while a protocol is an abstraction for an application, from the transport’s point of view. Yet another view is the transport and protocol interfaces together define an abstract interface for using network I/O and interprocess I/O. There is always a 1:1 relationship between transport and protocol objects: the protocol calls transport methods to send data, while the transport calls protocol methods to pass it data that has been received. Most of connection oriented event loop methods (such as "loop.create_connection()") usually accept a *protocol_factory* argument used to create a *Protocol* object for an accepted connection, represented by a *Transport* object. Such methods usually return a tuple of "(transport, protocol)". -[ Contents ]- This documentation page contains the following sections: * The Transports section documents asyncio "BaseTransport", "ReadTransport", "WriteTransport", "Transport", "DatagramTransport", and "SubprocessTransport" classes. * The Protocols section documents asyncio "BaseProtocol", "Protocol", "BufferedProtocol", "DatagramProtocol", and "SubprocessProtocol" classes. * The Examples section showcases how to work with transports, protocols, and low-level event loop APIs. Transports ========== **Source code:** Lib/asyncio/transports.py ====================================================================== Transports are classes provided by "asyncio" in order to abstract various kinds of communication channels. Transport objects are always instantiated by an asyncio event loop. asyncio implements transports for TCP, UDP, SSL, and subprocess pipes. The methods available on a transport depend on the transport’s kind. The transport classes are not thread safe. Transports Hierarchy -------------------- class asyncio.BaseTransport Base class for all transports. Contains methods that all asyncio transports share. class asyncio.WriteTransport(BaseTransport) A base transport for write-only connections. Instances of the *WriteTransport* class are returned from the "loop.connect_write_pipe()" event loop method and are also used by subprocess-related methods like "loop.subprocess_exec()". class asyncio.ReadTransport(BaseTransport) A base transport for read-only connections. Instances of the *ReadTransport* class are returned from the "loop.connect_read_pipe()" event loop method and are also used by subprocess-related methods like "loop.subprocess_exec()". class asyncio.Transport(WriteTransport, ReadTransport) Interface representing a bidirectional transport, such as a TCP connection. The user does not instantiate a transport directly; they call a utility function, passing it a protocol factory and other information necessary to create the transport and protocol. Instances of the *Transport* class are returned from or used by event loop methods like "loop.create_connection()", "loop.create_unix_connection()", "loop.create_server()", "loop.sendfile()", etc. class asyncio.DatagramTransport(BaseTransport) A transport for datagram (UDP) connections. Instances of the *DatagramTransport* class are returned from the "loop.create_datagram_endpoint()" event loop method. class asyncio.SubprocessTransport(BaseTransport) An abstraction to represent a connection between a parent and its child OS process. Instances of the *SubprocessTransport* class are returned from event loop methods "loop.subprocess_shell()" and "loop.subprocess_exec()". Base Transport -------------- BaseTransport.close() Close the transport. If the transport has a buffer for outgoing data, buffered data will be flushed asynchronously. No more data will be received. After all buffered data is flushed, the protocol’s "protocol.connection_lost()" method will be called with "None" as its argument. The transport should not be used once it is closed. BaseTransport.is_closing() Return "True" if the transport is closing or is closed. BaseTransport.get_extra_info(name, default=None) Return information about the transport or underlying resources it uses. *name* is a string representing the piece of transport-specific information to get. *default* is the value to return if the information is not available, or if the transport does not support querying it with the given third-party event loop implementation or on the current platform. For example, the following code attempts to get the underlying socket object of the transport: sock = transport.get_extra_info('socket') if sock is not None: print(sock.getsockopt(...)) Categories of information that can be queried on some transports: * socket: * "'peername'": the remote address to which the socket is connected, result of "socket.socket.getpeername()" ("None" on error) * "'socket'": "socket.socket" instance * "'sockname'": the socket’s own address, result of "socket.socket.getsockname()" * SSL socket: * "'compression'": the compression algorithm being used as a string, or "None" if the connection isn’t compressed; result of "ssl.SSLSocket.compression()" * "'cipher'": a three-value tuple containing the name of the cipher being used, the version of the SSL protocol that defines its use, and the number of secret bits being used; result of "ssl.SSLSocket.cipher()" * "'peercert'": peer certificate; result of "ssl.SSLSocket.getpeercert()" * "'sslcontext'": "ssl.SSLContext" instance * "'ssl_object'": "ssl.SSLObject" or "ssl.SSLSocket" instance * pipe: * "'pipe'": pipe object * subprocess: * "'subprocess'": "subprocess.Popen" instance BaseTransport.set_protocol(protocol) Set a new protocol. Switching protocol should only be done when both protocols are documented to support the switch. BaseTransport.get_protocol() Return the current protocol. Read-only Transports -------------------- ReadTransport.is_reading() Return "True" if the transport is receiving new data. Added in version 3.7. ReadTransport.pause_reading() Pause the receiving end of the transport. No data will be passed to the protocol’s "protocol.data_received()" method until "resume_reading()" is called. Changed in version 3.7: The method is idempotent, i.e. it can be called when the transport is already paused or closed. ReadTransport.resume_reading() Resume the receiving end. The protocol’s "protocol.data_received()" method will be called once again if some data is available for reading. Changed in version 3.7: The method is idempotent, i.e. it can be called when the transport is already reading. Write-only Transports --------------------- WriteTransport.abort() Close the transport immediately, without waiting for pending operations to complete. Buffered data will be lost. No more data will be received. The protocol’s "protocol.connection_lost()" method will eventually be called with "None" as its argument. WriteTransport.can_write_eof() Return "True" if the transport supports "write_eof()", "False" if not. WriteTransport.get_write_buffer_size() Return the current size of the output buffer used by the transport. WriteTransport.get_write_buffer_limits() Get the *high* and *low* watermarks for write flow control. Return a tuple "(low, high)" where *low* and *high* are positive number of bytes. Use "set_write_buffer_limits()" to set the limits. Added in version 3.4.2. WriteTransport.set_write_buffer_limits(high=None, low=None) Set the *high* and *low* watermarks for write flow control. These two values (measured in number of bytes) control when the protocol’s "protocol.pause_writing()" and "protocol.resume_writing()" methods are called. If specified, the low watermark must be less than or equal to the high watermark. Neither *high* nor *low* can be negative. "pause_writing()" is called when the buffer size becomes greater than or equal to the *high* value. If writing has been paused, "resume_writing()" is called when the buffer size becomes less than or equal to the *low* value. The defaults are implementation-specific. If only the high watermark is given, the low watermark defaults to an implementation-specific value less than or equal to the high watermark. Setting *high* to zero forces *low* to zero as well, and causes "pause_writing()" to be called whenever the buffer becomes non-empty. Setting *low* to zero causes "resume_writing()" to be called only once the buffer is empty. Use of zero for either limit is generally sub-optimal as it reduces opportunities for doing I/O and computation concurrently. Use "get_write_buffer_limits()" to get the limits. WriteTransport.write(data) Write some *data* bytes to the transport. This method does not block; it buffers the data and arranges for it to be sent out asynchronously. WriteTransport.writelines(list_of_data) Write a list (or any iterable) of data bytes to the transport. This is functionally equivalent to calling "write()" on each element yielded by the iterable, but may be implemented more efficiently. WriteTransport.write_eof() Close the write end of the transport after flushing all buffered data. Data may still be received. This method can raise "NotImplementedError" if the transport (e.g. SSL) doesn’t support half-closed connections. Datagram Transports ------------------- DatagramTransport.sendto(data, addr=None) Send the *data* bytes to the remote peer given by *addr* (a transport-dependent target address). If *addr* is "None", the data is sent to the target address given on transport creation. This method does not block; it buffers the data and arranges for it to be sent out asynchronously. Changed in version 3.13: This method can be called with an empty bytes object to send a zero-length datagram. The buffer size calculation used for flow control is also updated to account for the datagram header. DatagramTransport.abort() Close the transport immediately, without waiting for pending operations to complete. Buffered data will be lost. No more data will be received. The protocol’s "protocol.connection_lost()" method will eventually be called with "None" as its argument. Subprocess Transports --------------------- SubprocessTransport.get_pid() Return the subprocess process id as an integer. SubprocessTransport.get_pipe_transport(fd) Return the transport for the communication pipe corresponding to the integer file descriptor *fd*: * "0": readable streaming transport of the standard input (*stdin*), or "None" if the subprocess was not created with "stdin=PIPE" * "1": writable streaming transport of the standard output (*stdout*), or "None" if the subprocess was not created with "stdout=PIPE" * "2": writable streaming transport of the standard error (*stderr*), or "None" if the subprocess was not created with "stderr=PIPE" * other *fd*: "None" SubprocessTransport.get_returncode() Return the subprocess return code as an integer or "None" if it hasn’t returned, which is similar to the "subprocess.Popen.returncode" attribute. SubprocessTransport.kill() Kill the subprocess. On POSIX systems, the function sends SIGKILL to the subprocess. On Windows, this method is an alias for "terminate()". See also "subprocess.Popen.kill()". SubprocessTransport.send_signal(signal) Send the *signal* number to the subprocess, as in "subprocess.Popen.send_signal()". SubprocessTransport.terminate() Stop the subprocess. On POSIX systems, this method sends "SIGTERM" to the subprocess. On Windows, the Windows API function "TerminateProcess()" is called to stop the subprocess. See also "subprocess.Popen.terminate()". SubprocessTransport.close() Kill the subprocess by calling the "kill()" method. If the subprocess hasn’t returned yet, and close transports of *stdin*, *stdout*, and *stderr* pipes. Protocols ========= **Source code:** Lib/asyncio/protocols.py ====================================================================== asyncio provides a set of abstract base classes that should be used to implement network protocols. Those classes are meant to be used together with transports. Subclasses of abstract base protocol classes may implement some or all methods. All these methods are callbacks: they are called by transports on certain events, for example when some data is received. A base protocol method should be called by the corresponding transport. Base Protocols -------------- class asyncio.BaseProtocol Base protocol with methods that all protocols share. class asyncio.Protocol(BaseProtocol) The base class for implementing streaming protocols (TCP, Unix sockets, etc). class asyncio.BufferedProtocol(BaseProtocol) A base class for implementing streaming protocols with manual control of the receive buffer. class asyncio.DatagramProtocol(BaseProtocol) The base class for implementing datagram (UDP) protocols. class asyncio.SubprocessProtocol(BaseProtocol) The base class for implementing protocols communicating with child processes (unidirectional pipes). Base Protocol ------------- All asyncio protocols can implement Base Protocol callbacks. -[ Connection Callbacks ]- Connection callbacks are called on all protocols, exactly once per a successful connection. All other protocol callbacks can only be called between those two methods. BaseProtocol.connection_made(transport) Called when a connection is made. The *transport* argument is the transport representing the connection. The protocol is responsible for storing the reference to its transport. BaseProtocol.connection_lost(exc) Called when the connection is lost or closed. The argument is either an exception object or "None". The latter means a regular EOF is received, or the connection was aborted or closed by this side of the connection. -[ Flow Control Callbacks ]- Flow control callbacks can be called by transports to pause or resume writing performed by the protocol. See the documentation of the "set_write_buffer_limits()" method for more details. BaseProtocol.pause_writing() Called when the transport’s buffer goes over the high watermark. BaseProtocol.resume_writing() Called when the transport’s buffer drains below the low watermark. If the buffer size equals the high watermark, "pause_writing()" is not called: the buffer size must go strictly over. Conversely, "resume_writing()" is called when the buffer size is equal or lower than the low watermark. These end conditions are important to ensure that things go as expected when either mark is zero. Streaming Protocols ------------------- Event methods, such as "loop.create_server()", "loop.create_unix_server()", "loop.create_connection()", "loop.create_unix_connection()", "loop.connect_accepted_socket()", "loop.connect_read_pipe()", and "loop.connect_write_pipe()" accept factories that return streaming protocols. Protocol.data_received(data) Called when some data is received. *data* is a non-empty bytes object containing the incoming data. Whether the data is buffered, chunked or reassembled depends on the transport. In general, you shouldn’t rely on specific semantics and instead make your parsing generic and flexible. However, data is always received in the correct order. The method can be called an arbitrary number of times while a connection is open. However, "protocol.eof_received()" is called at most once. Once "eof_received()" is called, "data_received()" is not called anymore. Protocol.eof_received() Called when the other end signals it won’t send any more data (for example by calling "transport.write_eof()", if the other end also uses asyncio). This method may return a false value (including "None"), in which case the transport will close itself. Conversely, if this method returns a true value, the protocol used determines whether to close the transport. Since the default implementation returns "None", it implicitly closes the connection. Some transports, including SSL, don’t support half-closed connections, in which case returning true from this method will result in the connection being closed. State machine: start -> connection_made [-> data_received]* [-> eof_received]? -> connection_lost -> end Buffered Streaming Protocols ---------------------------- Added in version 3.7. Buffered Protocols can be used with any event loop method that supports Streaming Protocols. "BufferedProtocol" implementations allow explicit manual allocation and control of the receive buffer. Event loops can then use the buffer provided by the protocol to avoid unnecessary data copies. This can result in noticeable performance improvement for protocols that receive big amounts of data. Sophisticated protocol implementations can significantly reduce the number of buffer allocations. The following callbacks are called on "BufferedProtocol" instances: BufferedProtocol.get_buffer(sizehint) Called to allocate a new receive buffer. *sizehint* is the recommended minimum size for the returned buffer. It is acceptable to return smaller or larger buffers than what *sizehint* suggests. When set to -1, the buffer size can be arbitrary. It is an error to return a buffer with a zero size. "get_buffer()" must return an object implementing the buffer protocol. BufferedProtocol.buffer_updated(nbytes) Called when the buffer was updated with the received data. *nbytes* is the total number of bytes that were written to the buffer. BufferedProtocol.eof_received() See the documentation of the "protocol.eof_received()" method. "get_buffer()" can be called an arbitrary number of times during a connection. However, "protocol.eof_received()" is called at most once and, if called, "get_buffer()" and "buffer_updated()" won’t be called after it. State machine: start -> connection_made [-> get_buffer [-> buffer_updated]? ]* [-> eof_received]? -> connection_lost -> end Datagram Protocols ------------------ Datagram Protocol instances should be constructed by protocol factories passed to the "loop.create_datagram_endpoint()" method. DatagramProtocol.datagram_received(data, addr) Called when a datagram is received. *data* is a bytes object containing the incoming data. *addr* is the address of the peer sending the data; the exact format depends on the transport. DatagramProtocol.error_received(exc) Called when a previous send or receive operation raises an "OSError". *exc* is the "OSError" instance. This method is called in rare conditions, when the transport (e.g. UDP) detects that a datagram could not be delivered to its recipient. In many conditions though, undeliverable datagrams will be silently dropped. Note: On BSD systems (macOS, FreeBSD, etc.) flow control is not supported for datagram protocols, because there is no reliable way to detect send failures caused by writing too many packets.The socket always appears ‘ready’ and excess packets are dropped. An "OSError" with "errno" set to "errno.ENOBUFS" may or may not be raised; if it is raised, it will be reported to "DatagramProtocol.error_received()" but otherwise ignored. Subprocess Protocols -------------------- Subprocess Protocol instances should be constructed by protocol factories passed to the "loop.subprocess_exec()" and "loop.subprocess_shell()" methods. SubprocessProtocol.pipe_data_received(fd, data) Called when the child process writes data into its stdout or stderr pipe. *fd* is the integer file descriptor of the pipe. *data* is a non-empty bytes object containing the received data. SubprocessProtocol.pipe_connection_lost(fd, exc) Called when one of the pipes communicating with the child process is closed. *fd* is the integer file descriptor that was closed. SubprocessProtocol.process_exited() Called when the child process has exited. It can be called before "pipe_data_received()" and "pipe_connection_lost()" methods. Examples ======== TCP Echo Server --------------- Create a TCP echo server using the "loop.create_server()" method, send back received data, and close the connection: import asyncio class EchoServerProtocol(asyncio.Protocol): def connection_made(self, transport): peername = transport.get_extra_info('peername') print('Connection from {}'.format(peername)) self.transport = transport def data_received(self, data): message = data.decode() print('Data received: {!r}'.format(message)) print('Send: {!r}'.format(message)) self.transport.write(data) print('Close the client socket') self.transport.close() async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() server = await loop.create_server( EchoServerProtocol, '127.0.0.1', 8888) async with server: await server.serve_forever() asyncio.run(main()) See also: The TCP echo server using streams example uses the high-level "asyncio.start_server()" function. TCP Echo Client --------------- A TCP echo client using the "loop.create_connection()" method, sends data, and waits until the connection is closed: import asyncio class EchoClientProtocol(asyncio.Protocol): def __init__(self, message, on_con_lost): self.message = message self.on_con_lost = on_con_lost def connection_made(self, transport): transport.write(self.message.encode()) print('Data sent: {!r}'.format(self.message)) def data_received(self, data): print('Data received: {!r}'.format(data.decode())) def connection_lost(self, exc): print('The server closed the connection') self.on_con_lost.set_result(True) async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() on_con_lost = loop.create_future() message = 'Hello World!' transport, protocol = await loop.create_connection( lambda: EchoClientProtocol(message, on_con_lost), '127.0.0.1', 8888) # Wait until the protocol signals that the connection # is lost and close the transport. try: await on_con_lost finally: transport.close() asyncio.run(main()) See also: The TCP echo client using streams example uses the high-level "asyncio.open_connection()" function. UDP Echo Server --------------- A UDP echo server, using the "loop.create_datagram_endpoint()" method, sends back received data: import asyncio class EchoServerProtocol: def connection_made(self, transport): self.transport = transport def datagram_received(self, data, addr): message = data.decode() print('Received %r from %s' % (message, addr)) print('Send %r to %s' % (message, addr)) self.transport.sendto(data, addr) async def main(): print("Starting UDP server") # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() # One protocol instance will be created to serve all # client requests. transport, protocol = await loop.create_datagram_endpoint( EchoServerProtocol, local_addr=('127.0.0.1', 9999)) try: await asyncio.sleep(3600) # Serve for 1 hour. finally: transport.close() asyncio.run(main()) UDP Echo Client --------------- A UDP echo client, using the "loop.create_datagram_endpoint()" method, sends data and closes the transport when it receives the answer: import asyncio class EchoClientProtocol: def __init__(self, message, on_con_lost): self.message = message self.on_con_lost = on_con_lost self.transport = None def connection_made(self, transport): self.transport = transport print('Send:', self.message) self.transport.sendto(self.message.encode()) def datagram_received(self, data, addr): print("Received:", data.decode()) print("Close the socket") self.transport.close() def error_received(self, exc): print('Error received:', exc) def connection_lost(self, exc): print("Connection closed") self.on_con_lost.set_result(True) async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() on_con_lost = loop.create_future() message = "Hello World!" transport, protocol = await loop.create_datagram_endpoint( lambda: EchoClientProtocol(message, on_con_lost), remote_addr=('127.0.0.1', 9999)) try: await on_con_lost finally: transport.close() asyncio.run(main()) Connecting Existing Sockets --------------------------- Wait until a socket receives data using the "loop.create_connection()" method with a protocol: import asyncio import socket class MyProtocol(asyncio.Protocol): def __init__(self, on_con_lost): self.transport = None self.on_con_lost = on_con_lost def connection_made(self, transport): self.transport = transport def data_received(self, data): print("Received:", data.decode()) # We are done: close the transport; # connection_lost() will be called automatically. self.transport.close() def connection_lost(self, exc): # The socket has been closed self.on_con_lost.set_result(True) async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() on_con_lost = loop.create_future() # Create a pair of connected sockets rsock, wsock = socket.socketpair() # Register the socket to wait for data. transport, protocol = await loop.create_connection( lambda: MyProtocol(on_con_lost), sock=rsock) # Simulate the reception of data from the network. loop.call_soon(wsock.send, 'abc'.encode()) try: await protocol.on_con_lost finally: transport.close() wsock.close() asyncio.run(main()) See also: The watch a file descriptor for read events example uses the low- level "loop.add_reader()" method to register an FD. The register an open socket to wait for data using streams example uses high-level streams created by the "open_connection()" function in a coroutine. loop.subprocess_exec() and SubprocessProtocol --------------------------------------------- An example of a subprocess protocol used to get the output of a subprocess and to wait for the subprocess exit. The subprocess is created by the "loop.subprocess_exec()" method: import asyncio import sys class DateProtocol(asyncio.SubprocessProtocol): def __init__(self, exit_future): self.exit_future = exit_future self.output = bytearray() self.pipe_closed = False self.exited = False def pipe_connection_lost(self, fd, exc): self.pipe_closed = True self.check_for_exit() def pipe_data_received(self, fd, data): self.output.extend(data) def process_exited(self): self.exited = True # process_exited() method can be called before # pipe_connection_lost() method: wait until both methods are # called. self.check_for_exit() def check_for_exit(self): if self.pipe_closed and self.exited: self.exit_future.set_result(True) async def get_date(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() code = 'import datetime; print(datetime.datetime.now())' exit_future = asyncio.Future(loop=loop) # Create the subprocess controlled by DateProtocol; # redirect the standard output into a pipe. transport, protocol = await loop.subprocess_exec( lambda: DateProtocol(exit_future), sys.executable, '-c', code, stdin=None, stderr=None) # Wait for the subprocess exit using the process_exited() # method of the protocol. await exit_future # Close the stdout pipe. transport.close() # Read the output which was collected by the # pipe_data_received() method of the protocol. data = bytes(protocol.output) return data.decode('ascii').rstrip() date = asyncio.run(get_date()) print(f"Current date: {date}") See also the same example written using high-level APIs. Queues ****** **Source code:** Lib/asyncio/queues.py ====================================================================== asyncio queues are designed to be similar to classes of the "queue" module. Although asyncio queues are not thread-safe, they are designed to be used specifically in async/await code. Note that methods of asyncio queues don’t have a *timeout* parameter; use "asyncio.wait_for()" function to do queue operations with a timeout. See also the Examples section below. Queue ===== class asyncio.Queue(maxsize=0) A first in, first out (FIFO) queue. If *maxsize* is less than or equal to zero, the queue size is infinite. If it is an integer greater than "0", then "await put()" blocks when the queue reaches *maxsize* until an item is removed by "get()". Unlike the standard library threading "queue", the size of the queue is always known and can be returned by calling the "qsize()" method. Changed in version 3.10: Removed the *loop* parameter. This class is not thread safe. maxsize Number of items allowed in the queue. empty() Return "True" if the queue is empty, "False" otherwise. full() Return "True" if there are "maxsize" items in the queue. If the queue was initialized with "maxsize=0" (the default), then "full()" never returns "True". async get() Remove and return an item from the queue. If queue is empty, wait until an item is available. Raises "QueueShutDown" if the queue has been shut down and is empty, or if the queue has been shut down immediately. get_nowait() Return an item if one is immediately available, else raise "QueueEmpty". async join() Block until all items in the queue have been received and processed. The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer coroutine calls "task_done()" to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, "join()" unblocks. async put(item) Put an item into the queue. If the queue is full, wait until a free slot is available before adding the item. Raises "QueueShutDown" if the queue has been shut down. put_nowait(item) Put an item into the queue without blocking. If no free slot is immediately available, raise "QueueFull". qsize() Return the number of items in the queue. shutdown(immediate=False) Shut down the queue, making "get()" and "put()" raise "QueueShutDown". By default, "get()" on a shut down queue will only raise once the queue is empty. Set *immediate* to true to make "get()" raise immediately instead. All blocked callers of "put()" and "get()" will be unblocked. If *immediate* is true, a task will be marked as done for each remaining item in the queue, which may unblock callers of "join()". Added in version 3.13. task_done() Indicate that a formerly enqueued work item is complete. Used by queue consumers. For each "get()" used to fetch a work item, a subsequent call to "task_done()" tells the queue that the processing on the work item is complete. If a "join()" is currently blocking, it will resume when all items have been processed (meaning that a "task_done()" call was received for every item that had been "put()" into the queue). "shutdown(immediate=True)" calls "task_done()" for each remaining item in the queue. Raises "ValueError" if called more times than there were items placed in the queue. Priority Queue ============== class asyncio.PriorityQueue A variant of "Queue"; retrieves entries in priority order (lowest first). Entries are typically tuples of the form "(priority_number, data)". LIFO Queue ========== class asyncio.LifoQueue A variant of "Queue" that retrieves most recently added entries first (last in, first out). Exceptions ========== exception asyncio.QueueEmpty This exception is raised when the "get_nowait()" method is called on an empty queue. exception asyncio.QueueFull Exception raised when the "put_nowait()" method is called on a queue that has reached its *maxsize*. exception asyncio.QueueShutDown Exception raised when "put()" or "get()" is called on a queue which has been shut down. Added in version 3.13. Examples ======== Queues can be used to distribute workload between several concurrent tasks: import asyncio import random import time async def worker(name, queue): while True: # Get a "work item" out of the queue. sleep_for = await queue.get() # Sleep for the "sleep_for" seconds. await asyncio.sleep(sleep_for) # Notify the queue that the "work item" has been processed. queue.task_done() print(f'{name} has slept for {sleep_for:.2f} seconds') async def main(): # Create a queue that we will use to store our "workload". queue = asyncio.Queue() # Generate random timings and put them into the queue. total_sleep_time = 0 for _ in range(20): sleep_for = random.uniform(0.05, 1.0) total_sleep_time += sleep_for queue.put_nowait(sleep_for) # Create three worker tasks to process the queue concurrently. tasks = [] for i in range(3): task = asyncio.create_task(worker(f'worker-{i}', queue)) tasks.append(task) # Wait until the queue is fully processed. started_at = time.monotonic() await queue.join() total_slept_for = time.monotonic() - started_at # Cancel our worker tasks. for task in tasks: task.cancel() # Wait until all worker tasks are cancelled. await asyncio.gather(*tasks, return_exceptions=True) print('====') print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds') print(f'total expected sleep time: {total_sleep_time:.2f} seconds') asyncio.run(main()) Runners ******* **Source code:** Lib/asyncio/runners.py This section outlines high-level asyncio primitives to run asyncio code. They are built on top of an event loop with the aim to simplify async code usage for common wide-spread scenarios. * Running an asyncio Program * Runner context manager * Handling Keyboard Interruption Running an asyncio Program ========================== asyncio.run(coro, *, debug=None, loop_factory=None) Execute the *coroutine* *coro* and return the result. This function runs the passed coroutine, taking care of managing the asyncio event loop, *finalizing asynchronous generators*, and closing the executor. This function cannot be called when another asyncio event loop is running in the same thread. If *debug* is "True", the event loop will be run in debug mode. "False" disables debug mode explicitly. "None" is used to respect the global Debug Mode settings. If *loop_factory* is not "None", it is used to create a new event loop; otherwise "asyncio.new_event_loop()" is used. The loop is closed at the end. This function should be used as a main entry point for asyncio programs, and should ideally only be called once. It is recommended to use *loop_factory* to configure the event loop instead of policies. Passing "asyncio.EventLoop" allows running asyncio without the policy system. The executor is given a timeout duration of 5 minutes to shutdown. If the executor hasn’t finished within that duration, a warning is emitted and the executor is closed. Example: async def main(): await asyncio.sleep(1) print('hello') asyncio.run(main()) Added in version 3.7. Changed in version 3.9: Updated to use "loop.shutdown_default_executor()". Changed in version 3.10: *debug* is "None" by default to respect the global debug mode settings. Changed in version 3.12: Added *loop_factory* parameter. Runner context manager ====================== class asyncio.Runner(*, debug=None, loop_factory=None) A context manager that simplifies *multiple* async function calls in the same context. Sometimes several top-level async functions should be called in the same event loop and "contextvars.Context". If *debug* is "True", the event loop will be run in debug mode. "False" disables debug mode explicitly. "None" is used to respect the global Debug Mode settings. *loop_factory* could be used for overriding the loop creation. It is the responsibility of the *loop_factory* to set the created loop as the current one. By default "asyncio.new_event_loop()" is used and set as current event loop with "asyncio.set_event_loop()" if *loop_factory* is "None". Basically, "asyncio.run()" example can be rewritten with the runner usage: async def main(): await asyncio.sleep(1) print('hello') with asyncio.Runner() as runner: runner.run(main()) Added in version 3.11. run(coro, *, context=None) Run a *coroutine* *coro* in the embedded loop. Return the coroutine’s result or raise its exception. An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *coro* to run in. The runner’s default context is used if "None". This function cannot be called when another asyncio event loop is running in the same thread. close() Close the runner. Finalize asynchronous generators, shutdown default executor, close the event loop and release embedded "contextvars.Context". get_loop() Return the event loop associated with the runner instance. Note: "Runner" uses the lazy initialization strategy, its constructor doesn’t initialize underlying low-level structures.Embedded *loop* and *context* are created at the "with" body entering or the first call of "run()" or "get_loop()". Handling Keyboard Interruption ============================== Added in version 3.11. When "signal.SIGINT" is raised by "Ctrl"-"C", "KeyboardInterrupt" exception is raised in the main thread by default. However this doesn’t work with "asyncio" because it can interrupt asyncio internals and can hang the program from exiting. To mitigate this issue, "asyncio" handles "signal.SIGINT" as follows: 1. "asyncio.Runner.run()" installs a custom "signal.SIGINT" handler before any user code is executed and removes it when exiting from the function. 2. The "Runner" creates the main task for the passed coroutine for its execution. 3. When "signal.SIGINT" is raised by "Ctrl"-"C", the custom signal handler cancels the main task by calling "asyncio.Task.cancel()" which raises "asyncio.CancelledError" inside the main task. This causes the Python stack to unwind, "try/except" and "try/finally" blocks can be used for resource cleanup. After the main task is cancelled, "asyncio.Runner.run()" raises "KeyboardInterrupt". 4. A user could write a tight loop which cannot be interrupted by "asyncio.Task.cancel()", in which case the second following "Ctrl"-"C" immediately raises the "KeyboardInterrupt" without cancelling the main task. Streams ******* **Source code:** Lib/asyncio/streams.py ====================================================================== Streams are high-level async/await-ready primitives to work with network connections. Streams allow sending and receiving data without using callbacks or low-level protocols and transports. Here is an example of a TCP echo client written using asyncio streams: import asyncio async def tcp_echo_client(message): reader, writer = await asyncio.open_connection( '127.0.0.1', 8888) print(f'Send: {message!r}') writer.write(message.encode()) await writer.drain() data = await reader.read(100) print(f'Received: {data.decode()!r}') print('Close the connection') writer.close() await writer.wait_closed() asyncio.run(tcp_echo_client('Hello World!')) See also the Examples section below. -[ Stream Functions ]- The following top-level asyncio functions can be used to create and work with streams: async asyncio.open_connection(host=None, port=None, *, limit=None, ssl=None, family=0, proto=0, flags=0, sock=None, local_addr=None, server_hostname=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None, happy_eyeballs_delay=None, interleave=None) Establish a network connection and return a pair of "(reader, writer)" objects. The returned *reader* and *writer* objects are instances of "StreamReader" and "StreamWriter" classes. *limit* determines the buffer size limit used by the returned "StreamReader" instance. By default the *limit* is set to 64 KiB. The rest of the arguments are passed directly to "loop.create_connection()". Note: The *sock* argument transfers ownership of the socket to the "StreamWriter" created. To close the socket, call its "close()" method. Changed in version 3.7: Added the *ssl_handshake_timeout* parameter. Changed in version 3.8: Added the *happy_eyeballs_delay* and *interleave* parameters. Changed in version 3.10: Removed the *loop* parameter. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. async asyncio.start_server(client_connected_cb, host=None, port=None, *, limit=None, family=socket.AF_UNSPEC, flags=socket.AI_PASSIVE, sock=None, backlog=100, ssl=None, reuse_address=None, reuse_port=None, keep_alive=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None, start_serving=True) Start a socket server. The *client_connected_cb* callback is called whenever a new client connection is established. It receives a "(reader, writer)" pair as two arguments, instances of the "StreamReader" and "StreamWriter" classes. *client_connected_cb* can be a plain callable or a coroutine function; if it is a coroutine function, it will be automatically scheduled as a "Task". *limit* determines the buffer size limit used by the returned "StreamReader" instance. By default the *limit* is set to 64 KiB. The rest of the arguments are passed directly to "loop.create_server()". Note: The *sock* argument transfers ownership of the socket to the server created. To close the socket, call the server’s "close()" method. Changed in version 3.7: Added the *ssl_handshake_timeout* and *start_serving* parameters. Changed in version 3.10: Removed the *loop* parameter. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. Changed in version 3.13: Added the *keep_alive* parameter. -[ Unix Sockets ]- async asyncio.open_unix_connection(path=None, *, limit=None, ssl=None, sock=None, server_hostname=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None) Establish a Unix socket connection and return a pair of "(reader, writer)". Similar to "open_connection()" but operates on Unix sockets. See also the documentation of "loop.create_unix_connection()". Note: The *sock* argument transfers ownership of the socket to the "StreamWriter" created. To close the socket, call its "close()" method. Availability: Unix. Changed in version 3.7: Added the *ssl_handshake_timeout* parameter. The *path* parameter can now be a *path-like object* Changed in version 3.10: Removed the *loop* parameter. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. async asyncio.start_unix_server(client_connected_cb, path=None, *, limit=None, sock=None, backlog=100, ssl=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None, start_serving=True, cleanup_socket=True) Start a Unix socket server. Similar to "start_server()" but works with Unix sockets. If *cleanup_socket* is true then the Unix socket will automatically be removed from the filesystem when the server is closed, unless the socket has been replaced after the server has been created. See also the documentation of "loop.create_unix_server()". Note: The *sock* argument transfers ownership of the socket to the server created. To close the socket, call the server’s "close()" method. Availability: Unix. Changed in version 3.7: Added the *ssl_handshake_timeout* and *start_serving* parameters. The *path* parameter can now be a *path-like object*. Changed in version 3.10: Removed the *loop* parameter. Changed in version 3.11: Added the *ssl_shutdown_timeout* parameter. Changed in version 3.13: Added the *cleanup_socket* parameter. StreamReader ============ class asyncio.StreamReader Represents a reader object that provides APIs to read data from the IO stream. As an *asynchronous iterable*, the object supports the "async for" statement. It is not recommended to instantiate *StreamReader* objects directly; use "open_connection()" and "start_server()" instead. feed_eof() Acknowledge the EOF. async read(n=-1) Read up to *n* bytes from the stream. If *n* is not provided or set to "-1", read until EOF, then return all read "bytes". If EOF was received and the internal buffer is empty, return an empty "bytes" object. If *n* is "0", return an empty "bytes" object immediately. If *n* is positive, return at most *n* available "bytes" as soon as at least 1 byte is available in the internal buffer. If EOF is received before any byte is read, return an empty "bytes" object. async readline() Read one line, where “line” is a sequence of bytes ending with "\n". If EOF is received and "\n" was not found, the method returns partially read data. If EOF is received and the internal buffer is empty, return an empty "bytes" object. async readexactly(n) Read exactly *n* bytes. Raise an "IncompleteReadError" if EOF is reached before *n* can be read. Use the "IncompleteReadError.partial" attribute to get the partially read data. async readuntil(separator=b'\n') Read data from the stream until *separator* is found. On success, the data and separator will be removed from the internal buffer (consumed). Returned data will include the separator at the end. If the amount of data read exceeds the configured stream limit, a "LimitOverrunError" exception is raised, and the data is left in the internal buffer and can be read again. If EOF is reached before the complete separator is found, an "IncompleteReadError" exception is raised, and the internal buffer is reset. The "IncompleteReadError.partial" attribute may contain a portion of the separator. The *separator* may also be a tuple of separators. In this case the return value will be the shortest possible that has any separator as the suffix. For the purposes of "LimitOverrunError", the shortest possible separator is considered to be the one that matched. Added in version 3.5.2. Changed in version 3.13: The *separator* parameter may now be a "tuple" of separators. at_eof() Return "True" if the buffer is empty and "feed_eof()" was called. StreamWriter ============ class asyncio.StreamWriter Represents a writer object that provides APIs to write data to the IO stream. It is not recommended to instantiate *StreamWriter* objects directly; use "open_connection()" and "start_server()" instead. write(data) The method attempts to write the *data* to the underlying socket immediately. If that fails, the data is queued in an internal write buffer until it can be sent. The method should be used along with the "drain()" method: stream.write(data) await stream.drain() writelines(data) The method writes a list (or any iterable) of bytes to the underlying socket immediately. If that fails, the data is queued in an internal write buffer until it can be sent. The method should be used along with the "drain()" method: stream.writelines(lines) await stream.drain() close() The method closes the stream and the underlying socket. The method should be used, though not mandatory, along with the "wait_closed()" method: stream.close() await stream.wait_closed() can_write_eof() Return "True" if the underlying transport supports the "write_eof()" method, "False" otherwise. write_eof() Close the write end of the stream after the buffered write data is flushed. transport Return the underlying asyncio transport. get_extra_info(name, default=None) Access optional transport information; see "BaseTransport.get_extra_info()" for details. async drain() Wait until it is appropriate to resume writing to the stream. Example: writer.write(data) await writer.drain() This is a flow control method that interacts with the underlying IO write buffer. When the size of the buffer reaches the high watermark, *drain()* blocks until the size of the buffer is drained down to the low watermark and writing can be resumed. When there is nothing to wait for, the "drain()" returns immediately. async start_tls(sslcontext, *, server_hostname=None, ssl_handshake_timeout=None, ssl_shutdown_timeout=None) Upgrade an existing stream-based connection to TLS. Parameters: * *sslcontext*: a configured instance of "SSLContext". * *server_hostname*: sets or overrides the host name that the target server’s certificate will be matched against. * *ssl_handshake_timeout* is the time in seconds to wait for the TLS handshake to complete before aborting the connection. "60.0" seconds if "None" (default). * *ssl_shutdown_timeout* is the time in seconds to wait for the SSL shutdown to complete before aborting the connection. "30.0" seconds if "None" (default). Added in version 3.11. Changed in version 3.12: Added the *ssl_shutdown_timeout* parameter. is_closing() Return "True" if the stream is closed or in the process of being closed. Added in version 3.7. async wait_closed() Wait until the stream is closed. Should be called after "close()" to wait until the underlying connection is closed, ensuring that all data has been flushed before e.g. exiting the program. Added in version 3.7. Examples ======== TCP echo client using streams ----------------------------- TCP echo client using the "asyncio.open_connection()" function: import asyncio async def tcp_echo_client(message): reader, writer = await asyncio.open_connection( '127.0.0.1', 8888) print(f'Send: {message!r}') writer.write(message.encode()) await writer.drain() data = await reader.read(100) print(f'Received: {data.decode()!r}') print('Close the connection') writer.close() await writer.wait_closed() asyncio.run(tcp_echo_client('Hello World!')) See also: The TCP echo client protocol example uses the low-level "loop.create_connection()" method. TCP echo server using streams ----------------------------- TCP echo server using the "asyncio.start_server()" function: import asyncio async def handle_echo(reader, writer): data = await reader.read(100) message = data.decode() addr = writer.get_extra_info('peername') print(f"Received {message!r} from {addr!r}") print(f"Send: {message!r}") writer.write(data) await writer.drain() print("Close the connection") writer.close() await writer.wait_closed() async def main(): server = await asyncio.start_server( handle_echo, '127.0.0.1', 8888) addrs = ', '.join(str(sock.getsockname()) for sock in server.sockets) print(f'Serving on {addrs}') async with server: await server.serve_forever() asyncio.run(main()) See also: The TCP echo server protocol example uses the "loop.create_server()" method. Get HTTP headers ---------------- Simple example querying HTTP headers of the URL passed on the command line: import asyncio import urllib.parse import sys async def print_http_headers(url): url = urllib.parse.urlsplit(url) if url.scheme == 'https': reader, writer = await asyncio.open_connection( url.hostname, 443, ssl=True) else: reader, writer = await asyncio.open_connection( url.hostname, 80) query = ( f"HEAD {url.path or '/'} HTTP/1.0\r\n" f"Host: {url.hostname}\r\n" f"\r\n" ) writer.write(query.encode('latin-1')) while True: line = await reader.readline() if not line: break line = line.decode('latin1').rstrip() if line: print(f'HTTP header> {line}') # Ignore the body, close the socket writer.close() await writer.wait_closed() url = sys.argv[1] asyncio.run(print_http_headers(url)) Usage: python example.py http://example.com/path/page.html or with HTTPS: python example.py https://example.com/path/page.html Register an open socket to wait for data using streams ------------------------------------------------------ Coroutine waiting until a socket receives data using the "open_connection()" function: import asyncio import socket async def wait_for_data(): # Get a reference to the current event loop because # we want to access low-level APIs. loop = asyncio.get_running_loop() # Create a pair of connected sockets. rsock, wsock = socket.socketpair() # Register the open socket to wait for data. reader, writer = await asyncio.open_connection(sock=rsock) # Simulate the reception of data from the network loop.call_soon(wsock.send, 'abc'.encode()) # Wait for data data = await reader.read(100) # Got data, we are done: close the socket print("Received:", data.decode()) writer.close() await writer.wait_closed() # Close the second socket wsock.close() asyncio.run(wait_for_data()) See also: The register an open socket to wait for data using a protocol example uses a low-level protocol and the "loop.create_connection()" method. The watch a file descriptor for read events example uses the low- level "loop.add_reader()" method to watch a file descriptor. Subprocesses ************ **Source code:** Lib/asyncio/subprocess.py, Lib/asyncio/base_subprocess.py ====================================================================== This section describes high-level async/await asyncio APIs to create and manage subprocesses. Here’s an example of how asyncio can run a shell command and obtain its result: import asyncio async def run(cmd): proc = await asyncio.create_subprocess_shell( cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE) stdout, stderr = await proc.communicate() print(f'[{cmd!r} exited with {proc.returncode}]') if stdout: print(f'[stdout]\n{stdout.decode()}') if stderr: print(f'[stderr]\n{stderr.decode()}') asyncio.run(run('ls /zzz')) will print: ['ls /zzz' exited with 1] [stderr] ls: /zzz: No such file or directory Because all asyncio subprocess functions are asynchronous and asyncio provides many tools to work with such functions, it is easy to execute and monitor multiple subprocesses in parallel. It is indeed trivial to modify the above example to run several commands simultaneously: async def main(): await asyncio.gather( run('ls /zzz'), run('sleep 1; echo "hello"')) asyncio.run(main()) See also the Examples subsection. Creating Subprocesses ===================== async asyncio.create_subprocess_exec(program, *args, stdin=None, stdout=None, stderr=None, limit=None, **kwds) Create a subprocess. The *limit* argument sets the buffer limit for "StreamReader" wrappers for "stdout" and "stderr" (if "subprocess.PIPE" is passed to *stdout* and *stderr* arguments). Return a "Process" instance. See the documentation of "loop.subprocess_exec()" for other parameters. Changed in version 3.10: Removed the *loop* parameter. async asyncio.create_subprocess_shell(cmd, stdin=None, stdout=None, stderr=None, limit=None, **kwds) Run the *cmd* shell command. The *limit* argument sets the buffer limit for "StreamReader" wrappers for "stdout" and "stderr" (if "subprocess.PIPE" is passed to *stdout* and *stderr* arguments). Return a "Process" instance. See the documentation of "loop.subprocess_shell()" for other parameters. Important: It is the application’s responsibility to ensure that all whitespace and special characters are quoted appropriately to avoid shell injection vulnerabilities. The "shlex.quote()" function can be used to properly escape whitespace and special shell characters in strings that are going to be used to construct shell commands. Changed in version 3.10: Removed the *loop* parameter. Note: Subprocesses are available for Windows if a "ProactorEventLoop" is used. See Subprocess Support on Windows for details. See also: asyncio also has the following *low-level* APIs to work with subprocesses: "loop.subprocess_exec()", "loop.subprocess_shell()", "loop.connect_read_pipe()", "loop.connect_write_pipe()", as well as the Subprocess Transports and Subprocess Protocols. Constants ========= asyncio.subprocess.PIPE Can be passed to the *stdin*, *stdout* or *stderr* parameters. If *PIPE* is passed to *stdin* argument, the "Process.stdin" attribute will point to a "StreamWriter" instance. If *PIPE* is passed to *stdout* or *stderr* arguments, the "Process.stdout" and "Process.stderr" attributes will point to "StreamReader" instances. asyncio.subprocess.STDOUT Special value that can be used as the *stderr* argument and indicates that standard error should be redirected into standard output. asyncio.subprocess.DEVNULL Special value that can be used as the *stdin*, *stdout* or *stderr* argument to process creation functions. It indicates that the special file "os.devnull" will be used for the corresponding subprocess stream. Interacting with Subprocesses ============================= Both "create_subprocess_exec()" and "create_subprocess_shell()" functions return instances of the *Process* class. *Process* is a high-level wrapper that allows communicating with subprocesses and watching for their completion. class asyncio.subprocess.Process An object that wraps OS processes created by the "create_subprocess_exec()" and "create_subprocess_shell()" functions. This class is designed to have a similar API to the "subprocess.Popen" class, but there are some notable differences: * unlike Popen, Process instances do not have an equivalent to the "poll()" method; * the "communicate()" and "wait()" methods don’t have a *timeout* parameter: use the "wait_for()" function; * the "Process.wait()" method is asynchronous, whereas "subprocess.Popen.wait()" method is implemented as a blocking busy loop; * the *universal_newlines* parameter is not supported. This class is not thread safe. See also the Subprocess and Threads section. async wait() Wait for the child process to terminate. Set and return the "returncode" attribute. Note: This method can deadlock when using "stdout=PIPE" or "stderr=PIPE" and the child process generates so much output that it blocks waiting for the OS pipe buffer to accept more data. Use the "communicate()" method when using pipes to avoid this condition. async communicate(input=None) Interact with process: 1. send data to *stdin* (if *input* is not "None"); 2. closes *stdin*; 3. read data from *stdout* and *stderr*, until EOF is reached; 4. wait for process to terminate. The optional *input* argument is the data ("bytes" object) that will be sent to the child process. Return a tuple "(stdout_data, stderr_data)". If either "BrokenPipeError" or "ConnectionResetError" exception is raised when writing *input* into *stdin*, the exception is ignored. This condition occurs when the process exits before all data are written into *stdin*. If it is desired to send data to the process’ *stdin*, the process needs to be created with "stdin=PIPE". Similarly, to get anything other than "None" in the result tuple, the process has to be created with "stdout=PIPE" and/or "stderr=PIPE" arguments. Note, that the data read is buffered in memory, so do not use this method if the data size is large or unlimited. Changed in version 3.12: *stdin* gets closed when "input=None" too. send_signal(signal) Sends the signal *signal* to the child process. Note: On Windows, "SIGTERM" is an alias for "terminate()". "CTRL_C_EVENT" and "CTRL_BREAK_EVENT" can be sent to processes started with a *creationflags* parameter which includes "CREATE_NEW_PROCESS_GROUP". terminate() Stop the child process. On POSIX systems this method sends "SIGTERM" to the child process. On Windows the Win32 API function "TerminateProcess()" is called to stop the child process. kill() Kill the child process. On POSIX systems this method sends "SIGKILL" to the child process. On Windows this method is an alias for "terminate()". stdin Standard input stream ("StreamWriter") or "None" if the process was created with "stdin=None". stdout Standard output stream ("StreamReader") or "None" if the process was created with "stdout=None". stderr Standard error stream ("StreamReader") or "None" if the process was created with "stderr=None". Warning: Use the "communicate()" method rather than "process.stdin.write()", "await process.stdout.read()" or "await process.stderr.read()". This avoids deadlocks due to streams pausing reading or writing and blocking the child process. pid Process identification number (PID). Note that for processes created by the "create_subprocess_shell()" function, this attribute is the PID of the spawned shell. returncode Return code of the process when it exits. A "None" value indicates that the process has not terminated yet. A negative value "-N" indicates that the child was terminated by signal "N" (POSIX only). Subprocess and Threads ---------------------- Standard asyncio event loop supports running subprocesses from different threads by default. On Windows subprocesses are provided by "ProactorEventLoop" only (default), "SelectorEventLoop" has no subprocess support. On UNIX *child watchers* are used for subprocess finish waiting, see Process Watchers for more info. Changed in version 3.8: UNIX switched to use "ThreadedChildWatcher" for spawning subprocesses from different threads without any limitation.Spawning a subprocess with *inactive* current child watcher raises "RuntimeError". Note that alternative event loop implementations might have own limitations; please refer to their documentation. See also: The Concurrency and multithreading in asyncio section. Examples -------- An example using the "Process" class to control a subprocess and the "StreamReader" class to read from its standard output. The subprocess is created by the "create_subprocess_exec()" function: import asyncio import sys async def get_date(): code = 'import datetime; print(datetime.datetime.now())' # Create the subprocess; redirect the standard output # into a pipe. proc = await asyncio.create_subprocess_exec( sys.executable, '-c', code, stdout=asyncio.subprocess.PIPE) # Read one line of output. data = await proc.stdout.readline() line = data.decode('ascii').rstrip() # Wait for the subprocess exit. await proc.wait() return line date = asyncio.run(get_date()) print(f"Current date: {date}") See also the same example written using low-level APIs. Synchronization Primitives ************************** **Source code:** Lib/asyncio/locks.py ====================================================================== asyncio synchronization primitives are designed to be similar to those of the "threading" module with two important caveats: * asyncio primitives are not thread-safe, therefore they should not be used for OS thread synchronization (use "threading" for that); * methods of these synchronization primitives do not accept the *timeout* argument; use the "asyncio.wait_for()" function to perform operations with timeouts. asyncio has the following basic synchronization primitives: * "Lock" * "Event" * "Condition" * "Semaphore" * "BoundedSemaphore" * "Barrier" ====================================================================== Lock ==== class asyncio.Lock Implements a mutex lock for asyncio tasks. Not thread-safe. An asyncio lock can be used to guarantee exclusive access to a shared resource. The preferred way to use a Lock is an "async with" statement: lock = asyncio.Lock() # ... later async with lock: # access shared state which is equivalent to: lock = asyncio.Lock() # ... later await lock.acquire() try: # access shared state finally: lock.release() Changed in version 3.10: Removed the *loop* parameter. async acquire() Acquire the lock. This method waits until the lock is *unlocked*, sets it to *locked* and returns "True". When more than one coroutine is blocked in "acquire()" waiting for the lock to be unlocked, only one coroutine eventually proceeds. Acquiring a lock is *fair*: the coroutine that proceeds will be the first coroutine that started waiting on the lock. release() Release the lock. When the lock is *locked*, reset it to *unlocked* and return. If the lock is *unlocked*, a "RuntimeError" is raised. locked() Return "True" if the lock is *locked*. Event ===== class asyncio.Event An event object. Not thread-safe. An asyncio event can be used to notify multiple asyncio tasks that some event has happened. An Event object manages an internal flag that can be set to *true* with the "set()" method and reset to *false* with the "clear()" method. The "wait()" method blocks until the flag is set to *true*. The flag is set to *false* initially. Changed in version 3.10: Removed the *loop* parameter. Example: async def waiter(event): print('waiting for it ...') await event.wait() print('... got it!') async def main(): # Create an Event object. event = asyncio.Event() # Spawn a Task to wait until 'event' is set. waiter_task = asyncio.create_task(waiter(event)) # Sleep for 1 second and set the event. await asyncio.sleep(1) event.set() # Wait until the waiter task is finished. await waiter_task asyncio.run(main()) async wait() Wait until the event is set. If the event is set, return "True" immediately. Otherwise block until another task calls "set()". set() Set the event. All tasks waiting for event to be set will be immediately awakened. clear() Clear (unset) the event. Tasks awaiting on "wait()" will now block until the "set()" method is called again. is_set() Return "True" if the event is set. Condition ========= class asyncio.Condition(lock=None) A Condition object. Not thread-safe. An asyncio condition primitive can be used by a task to wait for some event to happen and then get exclusive access to a shared resource. In essence, a Condition object combines the functionality of an "Event" and a "Lock". It is possible to have multiple Condition objects share one Lock, which allows coordinating exclusive access to a shared resource between different tasks interested in particular states of that shared resource. The optional *lock* argument must be a "Lock" object or "None". In the latter case a new Lock object is created automatically. Changed in version 3.10: Removed the *loop* parameter. The preferred way to use a Condition is an "async with" statement: cond = asyncio.Condition() # ... later async with cond: await cond.wait() which is equivalent to: cond = asyncio.Condition() # ... later await cond.acquire() try: await cond.wait() finally: cond.release() async acquire() Acquire the underlying lock. This method waits until the underlying lock is *unlocked*, sets it to *locked* and returns "True". notify(n=1) Wake up *n* tasks (1 by default) waiting on this condition. If fewer than *n* tasks are waiting they are all awakened. The lock must be acquired before this method is called and released shortly after. If called with an *unlocked* lock a "RuntimeError" error is raised. locked() Return "True" if the underlying lock is acquired. notify_all() Wake up all tasks waiting on this condition. This method acts like "notify()", but wakes up all waiting tasks. The lock must be acquired before this method is called and released shortly after. If called with an *unlocked* lock a "RuntimeError" error is raised. release() Release the underlying lock. When invoked on an unlocked lock, a "RuntimeError" is raised. async wait() Wait until notified. If the calling task has not acquired the lock when this method is called, a "RuntimeError" is raised. This method releases the underlying lock, and then blocks until it is awakened by a "notify()" or "notify_all()" call. Once awakened, the Condition re-acquires its lock and this method returns "True". Note that a task *may* return from this call spuriously, which is why the caller should always re-check the state and be prepared to "wait()" again. For this reason, you may prefer to use "wait_for()" instead. async wait_for(predicate) Wait until a predicate becomes *true*. The predicate must be a callable which result will be interpreted as a boolean value. The method will repeatedly "wait()" until the predicate evaluates to *true*. The final value is the return value. Semaphore ========= class asyncio.Semaphore(value=1) A Semaphore object. Not thread-safe. A semaphore manages an internal counter which is decremented by each "acquire()" call and incremented by each "release()" call. The counter can never go below zero; when "acquire()" finds that it is zero, it blocks, waiting until some task calls "release()". The optional *value* argument gives the initial value for the internal counter ("1" by default). If the given value is less than "0" a "ValueError" is raised. Changed in version 3.10: Removed the *loop* parameter. The preferred way to use a Semaphore is an "async with" statement: sem = asyncio.Semaphore(10) # ... later async with sem: # work with shared resource which is equivalent to: sem = asyncio.Semaphore(10) # ... later await sem.acquire() try: # work with shared resource finally: sem.release() async acquire() Acquire a semaphore. If the internal counter is greater than zero, decrement it by one and return "True" immediately. If it is zero, wait until a "release()" is called and return "True". locked() Returns "True" if semaphore can not be acquired immediately. release() Release a semaphore, incrementing the internal counter by one. Can wake up a task waiting to acquire the semaphore. Unlike "BoundedSemaphore", "Semaphore" allows making more "release()" calls than "acquire()" calls. BoundedSemaphore ================ class asyncio.BoundedSemaphore(value=1) A bounded semaphore object. Not thread-safe. Bounded Semaphore is a version of "Semaphore" that raises a "ValueError" in "release()" if it increases the internal counter above the initial *value*. Changed in version 3.10: Removed the *loop* parameter. Barrier ======= class asyncio.Barrier(parties) A barrier object. Not thread-safe. A barrier is a simple synchronization primitive that allows to block until *parties* number of tasks are waiting on it. Tasks can wait on the "wait()" method and would be blocked until the specified number of tasks end up waiting on "wait()". At that point all of the waiting tasks would unblock simultaneously. "async with" can be used as an alternative to awaiting on "wait()". The barrier can be reused any number of times. Example: async def example_barrier(): # barrier with 3 parties b = asyncio.Barrier(3) # create 2 new waiting tasks asyncio.create_task(b.wait()) asyncio.create_task(b.wait()) await asyncio.sleep(0) print(b) # The third .wait() call passes the barrier await b.wait() print(b) print("barrier passed") await asyncio.sleep(0) print(b) asyncio.run(example_barrier()) Result of this example is: barrier passed Added in version 3.11. async wait() Pass the barrier. When all the tasks party to the barrier have called this function, they are all unblocked simultaneously. When a waiting or blocked task in the barrier is cancelled, this task exits the barrier which stays in the same state. If the state of the barrier is “filling”, the number of waiting task decreases by 1. The return value is an integer in the range of 0 to "parties-1", different for each task. This can be used to select a task to do some special housekeeping, e.g.: ... async with barrier as position: if position == 0: # Only one task prints this print('End of *draining phase*') This method may raise a "BrokenBarrierError" exception if the barrier is broken or reset while a task is waiting. It could raise a "CancelledError" if a task is cancelled. async reset() Return the barrier to the default, empty state. Any tasks waiting on it will receive the "BrokenBarrierError" exception. If a barrier is broken it may be better to just leave it and create a new one. async abort() Put the barrier into a broken state. This causes any active or future calls to "wait()" to fail with the "BrokenBarrierError". Use this for example if one of the tasks needs to abort, to avoid infinite waiting tasks. parties The number of tasks required to pass the barrier. n_waiting The number of tasks currently waiting in the barrier while filling. broken A boolean that is "True" if the barrier is in the broken state. exception asyncio.BrokenBarrierError This exception, a subclass of "RuntimeError", is raised when the "Barrier" object is reset or broken. ====================================================================== Changed in version 3.9: Acquiring a lock using "await lock" or "yield from lock" and/or "with" statement ("with await lock", "with (yield from lock)") was removed. Use "async with lock" instead. Coroutines and Tasks ******************** This section outlines high-level asyncio APIs to work with coroutines and Tasks. * Coroutines * Awaitables * Creating Tasks * Task Cancellation * Task Groups * Sleeping * Running Tasks Concurrently * Eager Task Factory * Shielding From Cancellation * Timeouts * Waiting Primitives * Running in Threads * Scheduling From Other Threads * Introspection * Task Object Coroutines ========== **Source code:** Lib/asyncio/coroutines.py ====================================================================== *Coroutines* declared with the async/await syntax is the preferred way of writing asyncio applications. For example, the following snippet of code prints “hello”, waits 1 second, and then prints “world”: >>> import asyncio >>> async def main(): ... print('hello') ... await asyncio.sleep(1) ... print('world') >>> asyncio.run(main()) hello world Note that simply calling a coroutine will not schedule it to be executed: >>> main() To actually run a coroutine, asyncio provides the following mechanisms: * The "asyncio.run()" function to run the top-level entry point “main()” function (see the above example.) * Awaiting on a coroutine. The following snippet of code will print “hello” after waiting for 1 second, and then print “world” after waiting for *another* 2 seconds: import asyncio import time async def say_after(delay, what): await asyncio.sleep(delay) print(what) async def main(): print(f"started at {time.strftime('%X')}") await say_after(1, 'hello') await say_after(2, 'world') print(f"finished at {time.strftime('%X')}") asyncio.run(main()) Expected output: started at 17:13:52 hello world finished at 17:13:55 * The "asyncio.create_task()" function to run coroutines concurrently as asyncio "Tasks". Let’s modify the above example and run two "say_after" coroutines *concurrently*: async def main(): task1 = asyncio.create_task( say_after(1, 'hello')) task2 = asyncio.create_task( say_after(2, 'world')) print(f"started at {time.strftime('%X')}") # Wait until both tasks are completed (should take # around 2 seconds.) await task1 await task2 print(f"finished at {time.strftime('%X')}") Note that expected output now shows that the snippet runs 1 second faster than before: started at 17:14:32 hello world finished at 17:14:34 * The "asyncio.TaskGroup" class provides a more modern alternative to "create_task()". Using this API, the last example becomes: async def main(): async with asyncio.TaskGroup() as tg: task1 = tg.create_task( say_after(1, 'hello')) task2 = tg.create_task( say_after(2, 'world')) print(f"started at {time.strftime('%X')}") # The await is implicit when the context manager exits. print(f"finished at {time.strftime('%X')}") The timing and output should be the same as for the previous version. Added in version 3.11: "asyncio.TaskGroup". Awaitables ========== We say that an object is an **awaitable** object if it can be used in an "await" expression. Many asyncio APIs are designed to accept awaitables. There are three main types of *awaitable* objects: **coroutines**, **Tasks**, and **Futures**. -[ Coroutines ]- Python coroutines are *awaitables* and therefore can be awaited from other coroutines: import asyncio async def nested(): return 42 async def main(): # Nothing happens if we just call "nested()". # A coroutine object is created but not awaited, # so it *won't run at all*. nested() # will raise a "RuntimeWarning". # Let's do it differently now and await it: print(await nested()) # will print "42". asyncio.run(main()) Important: In this documentation the term “coroutine” can be used for two closely related concepts: * a *coroutine function*: an "async def" function; * a *coroutine object*: an object returned by calling a *coroutine function*. -[ Tasks ]- *Tasks* are used to schedule coroutines *concurrently*. When a coroutine is wrapped into a *Task* with functions like "asyncio.create_task()" the coroutine is automatically scheduled to run soon: import asyncio async def nested(): return 42 async def main(): # Schedule nested() to run soon concurrently # with "main()". task = asyncio.create_task(nested()) # "task" can now be used to cancel "nested()", or # can simply be awaited to wait until it is complete: await task asyncio.run(main()) -[ Futures ]- A "Future" is a special **low-level** awaitable object that represents an **eventual result** of an asynchronous operation. When a Future object is *awaited* it means that the coroutine will wait until the Future is resolved in some other place. Future objects in asyncio are needed to allow callback-based code to be used with async/await. Normally **there is no need** to create Future objects at the application level code. Future objects, sometimes exposed by libraries and some asyncio APIs, can be awaited: async def main(): await function_that_returns_a_future_object() # this is also valid: await asyncio.gather( function_that_returns_a_future_object(), some_python_coroutine() ) A good example of a low-level function that returns a Future object is "loop.run_in_executor()". Creating Tasks ============== **Source code:** Lib/asyncio/tasks.py ====================================================================== asyncio.create_task(coro, *, name=None, context=None) Wrap the *coro* coroutine into a "Task" and schedule its execution. Return the Task object. If *name* is not "None", it is set as the name of the task using "Task.set_name()". An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *coro* to run in. The current context copy is created when no *context* is provided. The task is executed in the loop returned by "get_running_loop()", "RuntimeError" is raised if there is no running loop in current thread. Note: "asyncio.TaskGroup.create_task()" is a new alternative leveraging structural concurrency; it allows for waiting for a group of related tasks with strong safety guarantees. Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. For reliable “fire-and-forget” background tasks, gather them in a collection: background_tasks = set() for i in range(10): task = asyncio.create_task(some_coro(param=i)) # Add task to the set. This creates a strong reference. background_tasks.add(task) # To prevent keeping references to finished tasks forever, # make each task remove its own reference from the set after # completion: task.add_done_callback(background_tasks.discard) Added in version 3.7. Changed in version 3.8: Added the *name* parameter. Changed in version 3.11: Added the *context* parameter. Task Cancellation ================= Tasks can easily and safely be cancelled. When a task is cancelled, "asyncio.CancelledError" will be raised in the task at the next opportunity. It is recommended that coroutines use "try/finally" blocks to robustly perform clean-up logic. In case "asyncio.CancelledError" is explicitly caught, it should generally be propagated when clean-up is complete. "asyncio.CancelledError" directly subclasses "BaseException" so most code will not need to be aware of it. The asyncio components that enable structured concurrency, like "asyncio.TaskGroup" and "asyncio.timeout()", are implemented using cancellation internally and might misbehave if a coroutine swallows "asyncio.CancelledError". Similarly, user code should not generally call "uncancel". However, in cases when suppressing "asyncio.CancelledError" is truly desired, it is necessary to also call "uncancel()" to completely remove the cancellation state. Task Groups =========== Task groups combine a task creation API with a convenient and reliable way to wait for all tasks in the group to finish. class asyncio.TaskGroup An asynchronous context manager holding a group of tasks. Tasks can be added to the group using "create_task()". All tasks are awaited when the context manager exits. Added in version 3.11. create_task(coro, *, name=None, context=None) Create a task in this task group. The signature matches that of "asyncio.create_task()". If the task group is inactive (e.g. not yet entered, already finished, or in the process of shutting down), we will close the given "coro". Changed in version 3.13: Close the given coroutine if the task group is not active. Example: async def main(): async with asyncio.TaskGroup() as tg: task1 = tg.create_task(some_coro(...)) task2 = tg.create_task(another_coro(...)) print(f"Both tasks have completed now: {task1.result()}, {task2.result()}") The "async with" statement will wait for all tasks in the group to finish. While waiting, new tasks may still be added to the group (for example, by passing "tg" into one of the coroutines and calling "tg.create_task()" in that coroutine). Once the last task has finished and the "async with" block is exited, no new tasks may be added to the group. The first time any of the tasks belonging to the group fails with an exception other than "asyncio.CancelledError", the remaining tasks in the group are cancelled. No further tasks can then be added to the group. At this point, if the body of the "async with" statement is still active (i.e., "__aexit__()" hasn’t been called yet), the task directly containing the "async with" statement is also cancelled. The resulting "asyncio.CancelledError" will interrupt an "await", but it will not bubble out of the containing "async with" statement. Once all tasks have finished, if any tasks have failed with an exception other than "asyncio.CancelledError", those exceptions are combined in an "ExceptionGroup" or "BaseExceptionGroup" (as appropriate; see their documentation) which is then raised. Two base exceptions are treated specially: If any task fails with "KeyboardInterrupt" or "SystemExit", the task group still cancels the remaining tasks and waits for them, but then the initial "KeyboardInterrupt" or "SystemExit" is re-raised instead of "ExceptionGroup" or "BaseExceptionGroup". If the body of the "async with" statement exits with an exception (so "__aexit__()" is called with an exception set), this is treated the same as if one of the tasks failed: the remaining tasks are cancelled and then waited for, and non-cancellation exceptions are grouped into an exception group and raised. The exception passed into "__aexit__()", unless it is "asyncio.CancelledError", is also included in the exception group. The same special case is made for "KeyboardInterrupt" and "SystemExit" as in the previous paragraph. Task groups are careful not to mix up the internal cancellation used to “wake up” their "__aexit__()" with cancellation requests for the task in which they are running made by other parties. In particular, when one task group is syntactically nested in another, and both experience an exception in one of their child tasks simultaneously, the inner task group will process its exceptions, and then the outer task group will receive another cancellation and process its own exceptions. In the case where a task group is cancelled externally and also must raise an "ExceptionGroup", it will call the parent task’s "cancel()" method. This ensures that a "asyncio.CancelledError" will be raised at the next "await", so the cancellation is not lost. Task groups preserve the cancellation count reported by "asyncio.Task.cancelling()". Changed in version 3.13: Improved handling of simultaneous internal and external cancellations and correct preservation of cancellation counts. Terminating a Task Group ------------------------ While terminating a task group is not natively supported by the standard library, termination can be achieved by adding an exception- raising task to the task group and ignoring the raised exception: import asyncio from asyncio import TaskGroup class TerminateTaskGroup(Exception): """Exception raised to terminate a task group.""" async def force_terminate_task_group(): """Used to force termination of a task group.""" raise TerminateTaskGroup() async def job(task_id, sleep_time): print(f'Task {task_id}: start') await asyncio.sleep(sleep_time) print(f'Task {task_id}: done') async def main(): try: async with TaskGroup() as group: # spawn some tasks group.create_task(job(1, 0.5)) group.create_task(job(2, 1.5)) # sleep for 1 second await asyncio.sleep(1) # add an exception-raising task to force the group to terminate group.create_task(force_terminate_task_group()) except* TerminateTaskGroup: pass asyncio.run(main()) Expected output: Task 1: start Task 2: start Task 1: done Sleeping ======== async asyncio.sleep(delay, result=None) Block for *delay* seconds. If *result* is provided, it is returned to the caller when the coroutine completes. "sleep()" always suspends the current task, allowing other tasks to run. Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call. Example of coroutine displaying the current date every second for 5 seconds: import asyncio import datetime async def display_date(): loop = asyncio.get_running_loop() end_time = loop.time() + 5.0 while True: print(datetime.datetime.now()) if (loop.time() + 1.0) >= end_time: break await asyncio.sleep(1) asyncio.run(display_date()) Changed in version 3.10: Removed the *loop* parameter. Changed in version 3.13: Raises "ValueError" if *delay* is "nan". Running Tasks Concurrently ========================== awaitable asyncio.gather(*aws, return_exceptions=False) Run awaitable objects in the *aws* sequence *concurrently*. If any awaitable in *aws* is a coroutine, it is automatically scheduled as a Task. If all awaitables are completed successfully, the result is an aggregate list of returned values. The order of result values corresponds to the order of awaitables in *aws*. If *return_exceptions* is "False" (default), the first raised exception is immediately propagated to the task that awaits on "gather()". Other awaitables in the *aws* sequence **won’t be cancelled** and will continue to run. If *return_exceptions* is "True", exceptions are treated the same as successful results, and aggregated in the result list. If "gather()" is *cancelled*, all submitted awaitables (that have not completed yet) are also *cancelled*. If any Task or Future from the *aws* sequence is *cancelled*, it is treated as if it raised "CancelledError" – the "gather()" call is **not** cancelled in this case. This is to prevent the cancellation of one submitted Task/Future to cause other Tasks/Futures to be cancelled. Note: A new alternative to create and run tasks concurrently and wait for their completion is "asyncio.TaskGroup". *TaskGroup* provides stronger safety guarantees than *gather* for scheduling a nesting of subtasks: if a task (or a subtask, a task scheduled by a task) raises an exception, *TaskGroup* will, while *gather* will not, cancel the remaining scheduled tasks). Example: import asyncio async def factorial(name, number): f = 1 for i in range(2, number + 1): print(f"Task {name}: Compute factorial({number}), currently i={i}...") await asyncio.sleep(1) f *= i print(f"Task {name}: factorial({number}) = {f}") return f async def main(): # Schedule three calls *concurrently*: L = await asyncio.gather( factorial("A", 2), factorial("B", 3), factorial("C", 4), ) print(L) asyncio.run(main()) # Expected output: # # Task A: Compute factorial(2), currently i=2... # Task B: Compute factorial(3), currently i=2... # Task C: Compute factorial(4), currently i=2... # Task A: factorial(2) = 2 # Task B: Compute factorial(3), currently i=3... # Task C: Compute factorial(4), currently i=3... # Task B: factorial(3) = 6 # Task C: Compute factorial(4), currently i=4... # Task C: factorial(4) = 24 # [2, 6, 24] Note: If *return_exceptions* is false, cancelling gather() after it has been marked done won’t cancel any submitted awaitables. For instance, gather can be marked done after propagating an exception to the caller, therefore, calling "gather.cancel()" after catching an exception (raised by one of the awaitables) from gather won’t cancel any other awaitables. Changed in version 3.7: If the *gather* itself is cancelled, the cancellation is propagated regardless of *return_exceptions*. Changed in version 3.10: Removed the *loop* parameter. Deprecated since version 3.10: Deprecation warning is emitted if no positional arguments are provided or not all positional arguments are Future-like objects and there is no running event loop. Eager Task Factory ================== asyncio.eager_task_factory(loop, coro, *, name=None, context=None) A task factory for eager task execution. When using this factory (via "loop.set_task_factory(asyncio.eager_task_factory)"), coroutines begin execution synchronously during "Task" construction. Tasks are only scheduled on the event loop if they block. This can be a performance improvement as the overhead of loop scheduling is avoided for coroutines that complete synchronously. A common example where this is beneficial is coroutines which employ caching or memoization to avoid actual I/O when possible. Note: Immediate execution of the coroutine is a semantic change. If the coroutine returns or raises, the task is never scheduled to the event loop. If the coroutine execution blocks, the task is scheduled to the event loop. This change may introduce behavior changes to existing applications. For example, the application’s task execution order is likely to change. Added in version 3.12. asyncio.create_eager_task_factory(custom_task_constructor) Create an eager task factory, similar to "eager_task_factory()", using the provided *custom_task_constructor* when creating a new task instead of the default "Task". *custom_task_constructor* must be a *callable* with the signature matching the signature of "Task.__init__". The callable must return a "asyncio.Task"-compatible object. This function returns a *callable* intended to be used as a task factory of an event loop via "loop.set_task_factory(factory)"). Added in version 3.12. Shielding From Cancellation =========================== awaitable asyncio.shield(aw) Protect an awaitable object from being "cancelled". If *aw* is a coroutine it is automatically scheduled as a Task. The statement: task = asyncio.create_task(something()) res = await shield(task) is equivalent to: res = await something() *except* that if the coroutine containing it is cancelled, the Task running in "something()" is not cancelled. From the point of view of "something()", the cancellation did not happen. Although its caller is still cancelled, so the “await” expression still raises a "CancelledError". If "something()" is cancelled by other means (i.e. from within itself) that would also cancel "shield()". If it is desired to completely ignore cancellation (not recommended) the "shield()" function should be combined with a try/except clause, as follows: task = asyncio.create_task(something()) try: res = await shield(task) except CancelledError: res = None Important: Save a reference to tasks passed to this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. Changed in version 3.10: Removed the *loop* parameter. Deprecated since version 3.10: Deprecation warning is emitted if *aw* is not Future-like object and there is no running event loop. Timeouts ======== asyncio.timeout(delay) Return an asynchronous context manager that can be used to limit the amount of time spent waiting on something. *delay* can either be "None", or a float/int number of seconds to wait. If *delay* is "None", no time limit will be applied; this can be useful if the delay is unknown when the context manager is created. In either case, the context manager can be rescheduled after creation using "Timeout.reschedule()". Example: async def main(): async with asyncio.timeout(10): await long_running_task() If "long_running_task" takes more than 10 seconds to complete, the context manager will cancel the current task and handle the resulting "asyncio.CancelledError" internally, transforming it into a "TimeoutError" which can be caught and handled. Note: The "asyncio.timeout()" context manager is what transforms the "asyncio.CancelledError" into a "TimeoutError", which means the "TimeoutError" can only be caught *outside* of the context manager. Example of catching "TimeoutError": async def main(): try: async with asyncio.timeout(10): await long_running_task() except TimeoutError: print("The long operation timed out, but we've handled it.") print("This statement will run regardless.") The context manager produced by "asyncio.timeout()" can be rescheduled to a different deadline and inspected. class asyncio.Timeout(when) An asynchronous context manager for cancelling overdue coroutines. "when" should be an absolute time at which the context should time out, as measured by the event loop’s clock: * If "when" is "None", the timeout will never trigger. * If "when < loop.time()", the timeout will trigger on the next iteration of the event loop. when() -> float | None Return the current deadline, or "None" if the current deadline is not set. reschedule(when: float | None) Reschedule the timeout. expired() -> bool Return whether the context manager has exceeded its deadline (expired). Example: async def main(): try: # We do not know the timeout when starting, so we pass ``None``. async with asyncio.timeout(None) as cm: # We know the timeout now, so we reschedule it. new_deadline = get_running_loop().time() + 10 cm.reschedule(new_deadline) await long_running_task() except TimeoutError: pass if cm.expired(): print("Looks like we haven't finished on time.") Timeout context managers can be safely nested. Added in version 3.11. asyncio.timeout_at(when) Similar to "asyncio.timeout()", except *when* is the absolute time to stop waiting, or "None". Example: async def main(): loop = get_running_loop() deadline = loop.time() + 20 try: async with asyncio.timeout_at(deadline): await long_running_task() except TimeoutError: print("The long operation timed out, but we've handled it.") print("This statement will run regardless.") Added in version 3.11. async asyncio.wait_for(aw, timeout) Wait for the *aw* awaitable to complete with a timeout. If *aw* is a coroutine it is automatically scheduled as a Task. *timeout* can either be "None" or a float or int number of seconds to wait for. If *timeout* is "None", block until the future completes. If a timeout occurs, it cancels the task and raises "TimeoutError". To avoid the task "cancellation", wrap it in "shield()". The function will wait until the future is actually cancelled, so the total wait time may exceed the *timeout*. If an exception happens during cancellation, it is propagated. If the wait is cancelled, the future *aw* is also cancelled. Example: async def eternity(): # Sleep for one hour await asyncio.sleep(3600) print('yay!') async def main(): # Wait for at most 1 second try: await asyncio.wait_for(eternity(), timeout=1.0) except TimeoutError: print('timeout!') asyncio.run(main()) # Expected output: # # timeout! Changed in version 3.7: When *aw* is cancelled due to a timeout, "wait_for" waits for *aw* to be cancelled. Previously, it raised "TimeoutError" immediately. Changed in version 3.10: Removed the *loop* parameter. Changed in version 3.11: Raises "TimeoutError" instead of "asyncio.TimeoutError". Waiting Primitives ================== async asyncio.wait(aws, *, timeout=None, return_when=ALL_COMPLETED) Run "Future" and "Task" instances in the *aws* iterable concurrently and block until the condition specified by *return_when*. The *aws* iterable must not be empty. Returns two sets of Tasks/Futures: "(done, pending)". Usage: done, pending = await asyncio.wait(aws) *timeout* (a float or int), if specified, can be used to control the maximum number of seconds to wait before returning. Note that this function does not raise "TimeoutError". Futures or Tasks that aren’t done when the timeout occurs are simply returned in the second set. *return_when* indicates when this function should return. It must be one of the following constants: +----------------------------------------------------+----------------------------------------------------+ | Constant | Description | |====================================================|====================================================| | asyncio.FIRST_COMPLETED | The function will return when any future finishes | | | or is cancelled. | +----------------------------------------------------+----------------------------------------------------+ | asyncio.FIRST_EXCEPTION | The function will return when any future finishes | | | by raising an exception. If no future raises an | | | exception then it is equivalent to | | | "ALL_COMPLETED". | +----------------------------------------------------+----------------------------------------------------+ | asyncio.ALL_COMPLETED | The function will return when all futures finish | | | or are cancelled. | +----------------------------------------------------+----------------------------------------------------+ Unlike "wait_for()", "wait()" does not cancel the futures when a timeout occurs. Changed in version 3.10: Removed the *loop* parameter. Changed in version 3.11: Passing coroutine objects to "wait()" directly is forbidden. Changed in version 3.12: Added support for generators yielding tasks. asyncio.as_completed(aws, *, timeout=None) Run awaitable objects in the *aws* iterable concurrently. The returned object can be iterated to obtain the results of the awaitables as they finish. The object returned by "as_completed()" can be iterated as an *asynchronous iterator* or a plain *iterator*. When asynchronous iteration is used, the originally-supplied awaitables are yielded if they are tasks or futures. This makes it easy to correlate previously-scheduled tasks with their results. Example: ipv4_connect = create_task(open_connection("127.0.0.1", 80)) ipv6_connect = create_task(open_connection("::1", 80)) tasks = [ipv4_connect, ipv6_connect] async for earliest_connect in as_completed(tasks): # earliest_connect is done. The result can be obtained by # awaiting it or calling earliest_connect.result() reader, writer = await earliest_connect if earliest_connect is ipv6_connect: print("IPv6 connection established.") else: print("IPv4 connection established.") During asynchronous iteration, implicitly-created tasks will be yielded for supplied awaitables that aren’t tasks or futures. When used as a plain iterator, each iteration yields a new coroutine that returns the result or raises the exception of the next completed awaitable. This pattern is compatible with Python versions older than 3.13: ipv4_connect = create_task(open_connection("127.0.0.1", 80)) ipv6_connect = create_task(open_connection("::1", 80)) tasks = [ipv4_connect, ipv6_connect] for next_connect in as_completed(tasks): # next_connect is not one of the original task objects. It must be # awaited to obtain the result value or raise the exception of the # awaitable that finishes next. reader, writer = await next_connect A "TimeoutError" is raised if the timeout occurs before all awaitables are done. This is raised by the "async for" loop during asynchronous iteration or by the coroutines yielded during plain iteration. Changed in version 3.10: Removed the *loop* parameter. Deprecated since version 3.10: Deprecation warning is emitted if not all awaitable objects in the *aws* iterable are Future-like objects and there is no running event loop. Changed in version 3.12: Added support for generators yielding tasks. Changed in version 3.13: The result can now be used as either an *asynchronous iterator* or as a plain *iterator* (previously it was only a plain iterator). Running in Threads ================== async asyncio.to_thread(func, /, *args, **kwargs) Asynchronously run function *func* in a separate thread. Any *args and **kwargs supplied for this function are directly passed to *func*. Also, the current "contextvars.Context" is propagated, allowing context variables from the event loop thread to be accessed in the separate thread. Return a coroutine that can be awaited to get the eventual result of *func*. This coroutine function is primarily intended to be used for executing IO-bound functions/methods that would otherwise block the event loop if they were run in the main thread. For example: def blocking_io(): print(f"start blocking_io at {time.strftime('%X')}") # Note that time.sleep() can be replaced with any blocking # IO-bound operation, such as file operations. time.sleep(1) print(f"blocking_io complete at {time.strftime('%X')}") async def main(): print(f"started main at {time.strftime('%X')}") await asyncio.gather( asyncio.to_thread(blocking_io), asyncio.sleep(1)) print(f"finished main at {time.strftime('%X')}") asyncio.run(main()) # Expected output: # # started main at 19:50:53 # start blocking_io at 19:50:53 # blocking_io complete at 19:50:54 # finished main at 19:50:54 Directly calling "blocking_io()" in any coroutine would block the event loop for its duration, resulting in an additional 1 second of run time. Instead, by using "asyncio.to_thread()", we can run it in a separate thread without blocking the event loop. Note: Due to the *GIL*, "asyncio.to_thread()" can typically only be used to make IO-bound functions non-blocking. However, for extension modules that release the GIL or alternative Python implementations that don’t have one, "asyncio.to_thread()" can also be used for CPU-bound functions. Added in version 3.9. Scheduling From Other Threads ============================= asyncio.run_coroutine_threadsafe(coro, loop) Submit a coroutine to the given event loop. Thread-safe. Return a "concurrent.futures.Future" to wait for the result from another OS thread. This function is meant to be called from a different OS thread than the one where the event loop is running. Example: # Create a coroutine coro = asyncio.sleep(1, result=3) # Submit the coroutine to a given loop future = asyncio.run_coroutine_threadsafe(coro, loop) # Wait for the result with an optional timeout argument assert future.result(timeout) == 3 If an exception is raised in the coroutine, the returned Future will be notified. It can also be used to cancel the task in the event loop: try: result = future.result(timeout) except TimeoutError: print('The coroutine took too long, cancelling the task...') future.cancel() except Exception as exc: print(f'The coroutine raised an exception: {exc!r}') else: print(f'The coroutine returned: {result!r}') See the concurrency and multithreading section of the documentation. Unlike other asyncio functions this function requires the *loop* argument to be passed explicitly. Added in version 3.5.1. Introspection ============= asyncio.current_task(loop=None) Return the currently running "Task" instance, or "None" if no task is running. If *loop* is "None" "get_running_loop()" is used to get the current loop. Added in version 3.7. asyncio.all_tasks(loop=None) Return a set of not yet finished "Task" objects run by the loop. If *loop* is "None", "get_running_loop()" is used for getting current loop. Added in version 3.7. asyncio.iscoroutine(obj) Return "True" if *obj* is a coroutine object. Added in version 3.4. Task Object =========== class asyncio.Task(coro, *, loop=None, name=None, context=None, eager_start=False) A "Future-like" object that runs a Python coroutine. Not thread- safe. Tasks are used to run coroutines in event loops. If a coroutine awaits on a Future, the Task suspends the execution of the coroutine and waits for the completion of the Future. When the Future is *done*, the execution of the wrapped coroutine resumes. Event loops use cooperative scheduling: an event loop runs one Task at a time. While a Task awaits for the completion of a Future, the event loop runs other Tasks, callbacks, or performs IO operations. Use the high-level "asyncio.create_task()" function to create Tasks, or the low-level "loop.create_task()" or "ensure_future()" functions. Manual instantiation of Tasks is discouraged. To cancel a running Task use the "cancel()" method. Calling it will cause the Task to throw a "CancelledError" exception into the wrapped coroutine. If a coroutine is awaiting on a Future object during cancellation, the Future object will be cancelled. "cancelled()" can be used to check if the Task was cancelled. The method returns "True" if the wrapped coroutine did not suppress the "CancelledError" exception and was actually cancelled. "asyncio.Task" inherits from "Future" all of its APIs except "Future.set_result()" and "Future.set_exception()". An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *coro* to run in. If no *context* is provided, the Task copies the current context and later runs its coroutine in the copied context. An optional keyword-only *eager_start* argument allows eagerly starting the execution of the "asyncio.Task" at task creation time. If set to "True" and the event loop is running, the task will start executing the coroutine immediately, until the first time the coroutine blocks. If the coroutine returns or raises without blocking, the task will be finished eagerly and will skip scheduling to the event loop. Changed in version 3.7: Added support for the "contextvars" module. Changed in version 3.8: Added the *name* parameter. Deprecated since version 3.10: Deprecation warning is emitted if *loop* is not specified and there is no running event loop. Changed in version 3.11: Added the *context* parameter. Changed in version 3.12: Added the *eager_start* parameter. done() Return "True" if the Task is *done*. A Task is *done* when the wrapped coroutine either returned a value, raised an exception, or the Task was cancelled. result() Return the result of the Task. If the Task is *done*, the result of the wrapped coroutine is returned (or if the coroutine raised an exception, that exception is re-raised.) If the Task has been *cancelled*, this method raises a "CancelledError" exception. If the Task’s result isn’t yet available, this method raises an "InvalidStateError" exception. exception() Return the exception of the Task. If the wrapped coroutine raised an exception that exception is returned. If the wrapped coroutine returned normally this method returns "None". If the Task has been *cancelled*, this method raises a "CancelledError" exception. If the Task isn’t *done* yet, this method raises an "InvalidStateError" exception. add_done_callback(callback, *, context=None) Add a callback to be run when the Task is *done*. This method should only be used in low-level callback-based code. See the documentation of "Future.add_done_callback()" for more details. remove_done_callback(callback) Remove *callback* from the callbacks list. This method should only be used in low-level callback-based code. See the documentation of "Future.remove_done_callback()" for more details. get_stack(*, limit=None) Return the list of stack frames for this Task. If the wrapped coroutine is not done, this returns the stack where it is suspended. If the coroutine has completed successfully or was cancelled, this returns an empty list. If the coroutine was terminated by an exception, this returns the list of traceback frames. The frames are always ordered from oldest to newest. Only one stack frame is returned for a suspended coroutine. The optional *limit* argument sets the maximum number of frames to return; by default all available frames are returned. The ordering of the returned list differs depending on whether a stack or a traceback is returned: the newest frames of a stack are returned, but the oldest frames of a traceback are returned. (This matches the behavior of the traceback module.) print_stack(*, limit=None, file=None) Print the stack or traceback for this Task. This produces output similar to that of the traceback module for the frames retrieved by "get_stack()". The *limit* argument is passed to "get_stack()" directly. The *file* argument is an I/O stream to which the output is written; by default output is written to "sys.stdout". get_coro() Return the coroutine object wrapped by the "Task". Note: This will return "None" for Tasks which have already completed eagerly. See the Eager Task Factory. Added in version 3.8. Changed in version 3.12: Newly added eager task execution means result may be "None". get_context() Return the "contextvars.Context" object associated with the task. Added in version 3.12. get_name() Return the name of the Task. If no name has been explicitly assigned to the Task, the default asyncio Task implementation generates a default name during instantiation. Added in version 3.8. set_name(value) Set the name of the Task. The *value* argument can be any object, which is then converted to a string. In the default Task implementation, the name will be visible in the "repr()" output of a task object. Added in version 3.8. cancel(msg=None) Request the Task to be cancelled. If the Task is already *done* or *cancelled*, return "False", otherwise, return "True". The method arranges for a "CancelledError" exception to be thrown into the wrapped coroutine on the next cycle of the event loop. The coroutine then has a chance to clean up or even deny the request by suppressing the exception with a "try" … … "except CancelledError" … "finally" block. Therefore, unlike "Future.cancel()", "Task.cancel()" does not guarantee that the Task will be cancelled, although suppressing cancellation completely is not common and is actively discouraged. Should the coroutine nevertheless decide to suppress the cancellation, it needs to call "Task.uncancel()" in addition to catching the exception. Changed in version 3.9: Added the *msg* parameter. Changed in version 3.11: The "msg" parameter is propagated from cancelled task to its awaiter. The following example illustrates how coroutines can intercept the cancellation request: async def cancel_me(): print('cancel_me(): before sleep') try: # Wait for 1 hour await asyncio.sleep(3600) except asyncio.CancelledError: print('cancel_me(): cancel sleep') raise finally: print('cancel_me(): after sleep') async def main(): # Create a "cancel_me" Task task = asyncio.create_task(cancel_me()) # Wait for 1 second await asyncio.sleep(1) task.cancel() try: await task except asyncio.CancelledError: print("main(): cancel_me is cancelled now") asyncio.run(main()) # Expected output: # # cancel_me(): before sleep # cancel_me(): cancel sleep # cancel_me(): after sleep # main(): cancel_me is cancelled now cancelled() Return "True" if the Task is *cancelled*. The Task is *cancelled* when the cancellation was requested with "cancel()" and the wrapped coroutine propagated the "CancelledError" exception thrown into it. uncancel() Decrement the count of cancellation requests to this Task. Returns the remaining number of cancellation requests. Note that once execution of a cancelled task completed, further calls to "uncancel()" are ineffective. Added in version 3.11. This method is used by asyncio’s internals and isn’t expected to be used by end-user code. In particular, if a Task gets successfully uncancelled, this allows for elements of structured concurrency like Task Groups and "asyncio.timeout()" to continue running, isolating cancellation to the respective structured block. For example: async def make_request_with_timeout(): try: async with asyncio.timeout(1): # Structured block affected by the timeout: await make_request() await make_another_request() except TimeoutError: log("There was a timeout") # Outer code not affected by the timeout: await unrelated_code() While the block with "make_request()" and "make_another_request()" might get cancelled due to the timeout, "unrelated_code()" should continue running even in case of the timeout. This is implemented with "uncancel()". "TaskGroup" context managers use "uncancel()" in a similar fashion. If end-user code is, for some reason, suppressing cancellation by catching "CancelledError", it needs to call this method to remove the cancellation state. When this method decrements the cancellation count to zero, the method checks if a previous "cancel()" call had arranged for "CancelledError" to be thrown into the task. If it hasn’t been thrown yet, that arrangement will be rescinded (by resetting the internal "_must_cancel" flag). Changed in version 3.13: Changed to rescind pending cancellation requests upon reaching zero. cancelling() Return the number of pending cancellation requests to this Task, i.e., the number of calls to "cancel()" less the number of "uncancel()" calls. Note that if this number is greater than zero but the Task is still executing, "cancelled()" will still return "False". This is because this number can be lowered by calling "uncancel()", which can lead to the task not being cancelled after all if the cancellation requests go down to zero. This method is used by asyncio’s internals and isn’t expected to be used by end-user code. See "uncancel()" for more details. Added in version 3.11. "asyncio" — Asynchronous I/O **************************** ====================================================================== Hello World! ^^^^^^^^^^^^ import asyncio async def main(): print('Hello ...') await asyncio.sleep(1) print('... World!') asyncio.run(main()) asyncio is a library to write **concurrent** code using the **async/await** syntax. asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc. asyncio is often a perfect fit for IO-bound and high-level **structured** network code. asyncio provides a set of **high-level** APIs to: * run Python coroutines concurrently and have full control over their execution; * perform network IO and IPC; * control subprocesses; * distribute tasks via queues; * synchronize concurrent code; Additionally, there are **low-level** APIs for *library and framework developers* to: * create and manage event loops, which provide asynchronous APIs for networking, running subprocesses, handling OS signals, etc; * implement efficient protocols using transports; * bridge callback-based libraries and code with async/await syntax. Availability: not WASI. This module does not work or is not available on WebAssembly. See WebAssembly platforms for more information. -[ asyncio REPL ]- You can experiment with an "asyncio" concurrent context in the *REPL*: $ python -m asyncio asyncio REPL ... Use "await" directly instead of "asyncio.run()". Type "help", "copyright", "credits" or "license" for more information. >>> import asyncio >>> await asyncio.sleep(10, result='hello') 'hello' Raises an auditing event "cpython.run_stdin" with no arguments. Changed in version 3.12.5: (also 3.11.10, 3.10.15, 3.9.20, and 3.8.20) Emits audit events. Changed in version 3.13: Uses PyREPL if possible, in which case "PYTHONSTARTUP" is also executed. Emits audit events. -[ Reference ]- High-level APIs ^^^^^^^^^^^^^^^ * Runners * Coroutines and Tasks * Streams * Synchronization Primitives * Subprocesses * Queues * Exceptions Low-level APIs ^^^^^^^^^^^^^^ * Event Loop * Futures * Transports and Protocols * Policies * Platform Support * Extending Guides and Tutorials ^^^^^^^^^^^^^^^^^^^^ * High-level API Index * Low-level API Index * Developing with asyncio Note: The source code for asyncio can be found in Lib/asyncio/. "asyncore" — Asynchronous socket handler **************************************** Deprecated since version 3.6, removed in version 3.12. This module is no longer part of the Python standard library. It was removed in Python 3.12 after being deprecated in Python 3.6. The removal was decided in **PEP 594**. Applications should use the "asyncio" module instead. The last version of Python that provided the "asyncore" module was Python 3.11. "atexit" — Exit handlers ************************ ====================================================================== The "atexit" module defines functions to register and unregister cleanup functions. Functions thus registered are automatically executed upon normal interpreter termination. "atexit" runs these functions in the *reverse* order in which they were registered; if you register "A", "B", and "C", at interpreter termination time they will be run in the order "C", "B", "A". **Note:** The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when "os._exit()" is called. **Note:** The effect of registering or unregistering functions from within a cleanup function is undefined. Changed in version 3.7: When used with C-API subinterpreters, registered functions are local to the interpreter they were registered in. atexit.register(func, *args, **kwargs) Register *func* as a function to be executed at termination. Any optional arguments that are to be passed to *func* must be passed as arguments to "register()". It is possible to register the same function and arguments more than once. At normal program termination (for instance, if "sys.exit()" is called or the main module’s execution completes), all functions registered are called in last in, first out order. The assumption is that lower level modules will normally be imported before higher level modules and thus must be cleaned up later. If an exception is raised during execution of the exit handlers, a traceback is printed (unless "SystemExit" is raised) and the exception information is saved. After all exit handlers have had a chance to run, the last exception to be raised is re-raised. This function returns *func*, which makes it possible to use it as a decorator. Warning: Starting new threads or calling "os.fork()" from a registered function can lead to race condition between the main Python runtime thread freeing thread states while internal "threading" routines or the new process try to use that state. This can lead to crashes rather than clean shutdown. Changed in version 3.12: Attempts to start a new thread or "os.fork()" a new process in a registered function now leads to "RuntimeError". atexit.unregister(func) Remove *func* from the list of functions to be run at interpreter shutdown. "unregister()" silently does nothing if *func* was not previously registered. If *func* has been registered more than once, every occurrence of that function in the "atexit" call stack will be removed. Equality comparisons ("==") are used internally during unregistration, so function references do not need to have matching identities. See also: Module "readline" Useful example of "atexit" to read and write "readline" history files. "atexit" Example ================ The following simple example demonstrates how a module can initialize a counter from a file when it is imported and save the counter’s updated value automatically when the program terminates without relying on the application making an explicit call into this module at termination. try: with open('counterfile') as infile: _count = int(infile.read()) except FileNotFoundError: _count = 0 def incrcounter(n): global _count _count = _count + n def savecounter(): with open('counterfile', 'w') as outfile: outfile.write('%d' % _count) import atexit atexit.register(savecounter) Positional and keyword arguments may also be passed to "register()" to be passed along to the registered function when it is called: def goodbye(name, adjective): print('Goodbye %s, it was %s to meet you.' % (name, adjective)) import atexit atexit.register(goodbye, 'Donny', 'nice') # or: atexit.register(goodbye, adjective='nice', name='Donny') Usage as a *decorator*: import atexit @atexit.register def goodbye(): print('You are now leaving the Python sector.') This only works with functions that can be called without arguments. "audioop" — Manipulate raw audio data ************************************* Deprecated since version 3.11, removed in version 3.13. This module is no longer part of the Python standard library. It was removed in Python 3.13 after being deprecated in Python 3.11. The removal was decided in **PEP 594**. The last version of Python that provided the "audioop" module was Python 3.12. Audit events table ****************** This table contains all events raised by "sys.audit()" or "PySys_Audit()" calls throughout the CPython runtime and the standard library. These calls were added in 3.8 or later (see **PEP 578**). See "sys.addaudithook()" and "PySys_AddAuditHook()" for information on handling these events. **CPython implementation detail:** This table is generated from the CPython documentation, and may not represent events raised by other implementations. See your runtime specific documentation for actual events raised. +--------------------------------+---------------------------------------------------------+-----------------+ | Audit event | Arguments | References | |================================|=========================================================|=================| | _thread.start_new_thread | "function", "args", "kwargs" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | array.__new__ | "typecode", "initializer" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | builtins.breakpoint | "breakpointhook" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | builtins.id | "id" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | builtins.input | "prompt" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | builtins.input/result | "result" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | code.__new__ | "code", "filename", "name", "argcount", | [1] | | | "posonlyargcount", "kwonlyargcount", "nlocals", | | | | "stacksize", "flags" | | +--------------------------------+---------------------------------------------------------+-----------------+ | compile | "source", "filename" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.PyInterpreterState_Cl | | [1] | | ear | | | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.PyInterpreterState_New | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython._PySys_ClearAuditHooks | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.run_command | "command" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.run_file | "filename" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.run_interactivehook | "hook" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.run_module | "module-name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.run_startup | "filename" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | cpython.run_stdin | | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.addressof | "obj" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.call_function | "func_pointer", "arguments" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.cdata | "address" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.cdata/buffer | "pointer", "size", "offset" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.create_string_buffer | "init", "size" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.create_unicode_buffer | "init", "size" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.dlopen | "name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.dlsym | "library", "name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.dlsym/handle | "handle", "name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.get_errno | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.get_last_error | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.set_errno | "errno" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.set_exception | "code" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.set_last_error | "error" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.string_at | "ptr", "size" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ctypes.wstring_at | "ptr", "size" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ensurepip.bootstrap | "root" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | exec | "code_object" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | fcntl.fcntl | "fd", "cmd", "arg" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | fcntl.flock | "fd", "operation" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | fcntl.ioctl | "fd", "request", "arg" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | fcntl.lockf | "fd", "cmd", "len", "start", "whence" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ftplib.connect | "self", "host", "port" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | ftplib.sendcmd | "self", "cmd" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | function.__new__ | "code" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | gc.get_objects | "generation" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | gc.get_referents | "objs" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | gc.get_referrers | "objs" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | glob.glob | "pathname", "recursive" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | glob.glob/2 | "pathname", "recursive", "root_dir", "dir_fd" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | http.client.connect | "self", "host", "port" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | http.client.send | "self", "data" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | imaplib.open | "self", "host", "port" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | imaplib.send | "self", "data" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | import | "module", "filename", "sys.path", "sys.meta_path", | [1] | | | "sys.path_hooks" | | +--------------------------------+---------------------------------------------------------+-----------------+ | marshal.dumps | "value", "version" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | marshal.load | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | marshal.loads | "bytes" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | mmap.__new__ | "fileno", "length", "access", "offset" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | msvcrt.get_osfhandle | "fd" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | msvcrt.locking | "fd", "mode", "nbytes" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | msvcrt.open_osfhandle | "handle", "flags" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | object.__delattr__ | "obj", "name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | object.__getattr__ | "obj", "name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | object.__setattr__ | "obj", "name", "value" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | open | "path", "mode", "flags" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.add_dll_directory | "path" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.chdir | "path" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.chflags | "path", "flags" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.chmod | "path", "mode", "dir_fd" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.chown | "path", "uid", "gid", "dir_fd" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.exec | "path", "args", "env" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.fork | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.forkpty | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.fwalk | "top", "topdown", "onerror", "follow_symlinks", | [1] | | | "dir_fd" | | +--------------------------------+---------------------------------------------------------+-----------------+ | os.getxattr | "path", "attribute" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.kill | "pid", "sig" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.killpg | "pgid", "sig" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.link | "src", "dst", "src_dir_fd", "dst_dir_fd" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.listdir | "path" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.listdrives | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.listmounts | "volume" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.listvolumes | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.listxattr | "path" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.lockf | "fd", "cmd", "len" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.mkdir | "path", "mode", "dir_fd" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.posix_spawn | "path", "argv", "env" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.putenv | "key", "value" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.remove | "path", "dir_fd" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.removexattr | "path", "attribute" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.rename | "src", "dst", "src_dir_fd", "dst_dir_fd" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.rmdir | "path", "dir_fd" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.scandir | "path" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.setxattr | "path", "attribute", "value", "flags" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.spawn | "mode", "path", "args", "env" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.startfile | "path", "operation" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.startfile/2 | "path", "operation", "arguments", "cwd", "show_cmd" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.symlink | "src", "dst", "dir_fd" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.system | "command" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.truncate | "fd", "length" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.unsetenv | "key" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.utime | "path", "times", "ns", "dir_fd" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | os.walk | "top", "topdown", "onerror", "followlinks" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | pathlib.Path.glob | "self", "pattern" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | pathlib.Path.rglob | "self", "pattern" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | pdb.Pdb | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | pickle.find_class | "module", "name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | poplib.connect | "self", "host", "port" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | poplib.putline | "self", "line" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | pty.spawn | "argv" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | resource.prlimit | "pid", "resource", "limits" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | resource.setrlimit | "resource", "limits" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | setopencodehook | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.chown | "path", "user", "group" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.copyfile | "src", "dst" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.copymode | "src", "dst" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.copystat | "src", "dst" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.copytree | "src", "dst" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.make_archive | "base_name", "format", "root_dir", "base_dir" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.move | "src", "dst" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.rmtree | "path", "dir_fd" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | shutil.unpack_archive | "filename", "extract_dir", "format" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | signal.pthread_kill | "thread_id", "signalnum" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | smtplib.connect | "self", "host", "port" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | smtplib.send | "self", "data" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.__new__ | "self", "family", "type", "protocol" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.bind | "self", "address" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.connect | "self", "address" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.getaddrinfo | "host", "port", "family", "type", "protocol" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.gethostbyaddr | "ip_address" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.gethostbyname | "hostname" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.gethostname | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.getnameinfo | "sockaddr" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.getservbyname | "servicename", "protocolname" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.getservbyport | "port", "protocolname" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.sendmsg | "self", "address" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.sendto | "self", "address" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | socket.sethostname | "name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sqlite3.connect | "database" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sqlite3.connect/handle | "connection_handle" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sqlite3.enable_load_extension | "connection", "enabled" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sqlite3.load_extension | "connection", "path" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | subprocess.Popen | "executable", "args", "cwd", "env" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys._current_exceptions | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys._current_frames | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys._getframe | "frame" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys._getframemodulename | "depth" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys.addaudithook | | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys.excepthook | "hook", "type", "value", "traceback" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys.set_asyncgen_hooks_finali | | [1] | | zer | | | +--------------------------------+---------------------------------------------------------+-----------------+ | sys.set_asyncgen_hooks_firsti | | [1] | | ter | | | +--------------------------------+---------------------------------------------------------+-----------------+ | sys.setprofile | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys.settrace | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | sys.unraisablehook | "hook", "unraisable" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | syslog.closelog | | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | syslog.openlog | "ident", "logoption", "facility" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | syslog.setlogmask | "maskpri" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | syslog.syslog | "priority", "message" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | tempfile.mkdtemp | "fullpath" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | tempfile.mkstemp | "fullpath" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | time.sleep | "secs" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | urllib.Request | "fullurl", "data", "headers", "method" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | webbrowser.open | "url" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.ConnectRegistry | "computer_name", "key" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.CreateKey | "key", "sub_key", "access" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.DeleteKey | "key", "sub_key", "access" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.DeleteValue | "key", "value" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.DisableReflectionKey | "key" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.EnableReflectionKey | "key" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.EnumKey | "key", "index" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.EnumValue | "key", "index" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.ExpandEnvironmentStrin | "str" | [1] | | gs | | | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.LoadKey | "key", "sub_key", "file_name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.OpenKey | "key", "sub_key", "access" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.OpenKey/result | "key" | [1][2][3] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.PyHKEY.Detach | "key" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.QueryInfoKey | "key" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.QueryReflectionKey | "key" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.QueryValue | "key", "sub_key", "value_name" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.SaveKey | "key", "file_name" | [1] | +--------------------------------+---------------------------------------------------------+-----------------+ | winreg.SetValue | "key", "sub_key", "type", "value" | [1][2] | +--------------------------------+---------------------------------------------------------+-----------------+ The following events are raised internally and do not correspond to any public API of CPython: +----------------------------+---------------------------------------------+ | Audit event | Arguments | |============================|=============================================| | _winapi.CreateFile | "file_name", "desired_access", | | | "share_mode", "creation_disposition", | | | "flags_and_attributes" | +----------------------------+---------------------------------------------+ | _winapi.CreateJunction | "src_path", "dst_path" | +----------------------------+---------------------------------------------+ | _winapi.CreateNamedPipe | "name", "open_mode", "pipe_mode" | +----------------------------+---------------------------------------------+ | _winapi.CreatePipe | | +----------------------------+---------------------------------------------+ | _winapi.CreateProcess | "application_name", "command_line", | | | "current_directory" | +----------------------------+---------------------------------------------+ | _winapi.OpenProcess | "process_id", "desired_access" | +----------------------------+---------------------------------------------+ | _winapi.TerminateProcess | "handle", "exit_code" | +----------------------------+---------------------------------------------+ | ctypes.PyObj_FromPtr | "obj" | +----------------------------+---------------------------------------------+ "base64" — Base16, Base32, Base64, Base85 Data Encodings ******************************************************** **Source code:** Lib/base64.py ====================================================================== This module provides functions for encoding binary data to printable ASCII characters and decoding such encodings back to binary data. This includes the encodings specified in **RFC 4648** (Base64, Base32 and Base16) and the non-standard Base85 encodings. There are two interfaces provided by this module. The modern interface supports encoding *bytes-like objects* to ASCII "bytes", and decoding *bytes-like objects* or strings containing ASCII to "bytes". Both base-64 alphabets defined in **RFC 4648** (normal, and URL- and filesystem-safe) are supported. The legacy interface does not support decoding from strings, but it does provide functions for encoding and decoding to and from *file objects*. It only supports the Base64 standard alphabet, and it adds newlines every 76 characters as per **RFC 2045**. Note that if you are looking for **RFC 2045** support you probably want to be looking at the "email" package instead. Changed in version 3.3: ASCII-only Unicode strings are now accepted by the decoding functions of the modern interface. Changed in version 3.4: Any *bytes-like objects* are now accepted by all encoding and decoding functions in this module. Ascii85/Base85 support added. RFC 4648 Encodings ================== The **RFC 4648** encodings are suitable for encoding binary data so that it can be safely sent by email, used as parts of URLs, or included as part of an HTTP POST request. base64.b64encode(s, altchars=None) Encode the *bytes-like object* *s* using Base64 and return the encoded "bytes". Optional *altchars* must be a *bytes-like object* of length 2 which specifies an alternative alphabet for the "+" and "/" characters. This allows an application to e.g. generate URL or filesystem safe Base64 strings. The default is "None", for which the standard Base64 alphabet is used. May assert or raise a "ValueError" if the length of *altchars* is not 2. Raises a "TypeError" if *altchars* is not a *bytes-like object*. base64.b64decode(s, altchars=None, validate=False) Decode the Base64 encoded *bytes-like object* or ASCII string *s* and return the decoded "bytes". Optional *altchars* must be a *bytes-like object* or ASCII string of length 2 which specifies the alternative alphabet used instead of the "+" and "/" characters. A "binascii.Error" exception is raised if *s* is incorrectly padded. If *validate* is "False" (the default), characters that are neither in the normal base-64 alphabet nor the alternative alphabet are discarded prior to the padding check. If *validate* is "True", these non-alphabet characters in the input result in a "binascii.Error". For more information about the strict base64 check, see "binascii.a2b_base64()" May assert or raise a "ValueError" if the length of *altchars* is not 2. base64.standard_b64encode(s) Encode *bytes-like object* *s* using the standard Base64 alphabet and return the encoded "bytes". base64.standard_b64decode(s) Decode *bytes-like object* or ASCII string *s* using the standard Base64 alphabet and return the decoded "bytes". base64.urlsafe_b64encode(s) Encode *bytes-like object* *s* using the URL- and filesystem-safe alphabet, which substitutes "-" instead of "+" and "_" instead of "/" in the standard Base64 alphabet, and return the encoded "bytes". The result can still contain "=". base64.urlsafe_b64decode(s) Decode *bytes-like object* or ASCII string *s* using the URL- and filesystem-safe alphabet, which substitutes "-" instead of "+" and "_" instead of "/" in the standard Base64 alphabet, and return the decoded "bytes". base64.b32encode(s) Encode the *bytes-like object* *s* using Base32 and return the encoded "bytes". base64.b32decode(s, casefold=False, map01=None) Decode the Base32 encoded *bytes-like object* or ASCII string *s* and return the decoded "bytes". Optional *casefold* is a flag specifying whether a lowercase alphabet is acceptable as input. For security purposes, the default is "False". **RFC 4648** allows for optional mapping of the digit 0 (zero) to the letter O (oh), and for optional mapping of the digit 1 (one) to either the letter I (eye) or letter L (el). The optional argument *map01* when not "None", specifies which letter the digit 1 should be mapped to (when *map01* is not "None", the digit 0 is always mapped to the letter O). For security purposes the default is "None", so that 0 and 1 are not allowed in the input. A "binascii.Error" is raised if *s* is incorrectly padded or if there are non-alphabet characters present in the input. base64.b32hexencode(s) Similar to "b32encode()" but uses the Extended Hex Alphabet, as defined in **RFC 4648**. Added in version 3.10. base64.b32hexdecode(s, casefold=False) Similar to "b32decode()" but uses the Extended Hex Alphabet, as defined in **RFC 4648**. This version does not allow the digit 0 (zero) to the letter O (oh) and digit 1 (one) to either the letter I (eye) or letter L (el) mappings, all these characters are included in the Extended Hex Alphabet and are not interchangeable. Added in version 3.10. base64.b16encode(s) Encode the *bytes-like object* *s* using Base16 and return the encoded "bytes". base64.b16decode(s, casefold=False) Decode the Base16 encoded *bytes-like object* or ASCII string *s* and return the decoded "bytes". Optional *casefold* is a flag specifying whether a lowercase alphabet is acceptable as input. For security purposes, the default is "False". A "binascii.Error" is raised if *s* is incorrectly padded or if there are non-alphabet characters present in the input. Base85 Encodings ================ Base85 encoding is not formally specified but rather a de facto standard, thus different systems perform the encoding differently. The "a85encode()" and "b85encode()" functions in this module are two implementations of the de facto standard. You should call the function with the Base85 implementation used by the software you intend to work with. The two functions present in this module differ in how they handle the following: * Whether to include enclosing "<~" and "~>" markers * Whether to include newline characters * The set of ASCII characters used for encoding * Handling of null bytes Refer to the documentation of the individual functions for more information. base64.a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False) Encode the *bytes-like object* *b* using Ascii85 and return the encoded "bytes". *foldspaces* is an optional flag that uses the special short sequence ‘y’ instead of 4 consecutive spaces (ASCII 0x20) as supported by ‘btoa’. This feature is not supported by the “standard” Ascii85 encoding. *wrapcol* controls whether the output should have newline ("b'\n'") characters added to it. If this is non-zero, each output line will be at most this many characters long, excluding the trailing newline. *pad* controls whether the input is padded to a multiple of 4 before encoding. Note that the "btoa" implementation always pads. *adobe* controls whether the encoded byte sequence is framed with "<~" and "~>", which is used by the Adobe implementation. Added in version 3.4. base64.a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\x0b') Decode the Ascii85 encoded *bytes-like object* or ASCII string *b* and return the decoded "bytes". *foldspaces* is a flag that specifies whether the ‘y’ short sequence should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20). This feature is not supported by the “standard” Ascii85 encoding. *adobe* controls whether the input sequence is in Adobe Ascii85 format (i.e. is framed with <~ and ~>). *ignorechars* should be a *bytes-like object* or ASCII string containing characters to ignore from the input. This should only contain whitespace characters, and by default contains all whitespace characters in ASCII. Added in version 3.4. base64.b85encode(b, pad=False) Encode the *bytes-like object* *b* using base85 (as used in e.g. git-style binary diffs) and return the encoded "bytes". If *pad* is true, the input is padded with "b'\0'" so its length is a multiple of 4 bytes before encoding. Added in version 3.4. base64.b85decode(b) Decode the base85-encoded *bytes-like object* or ASCII string *b* and return the decoded "bytes". Padding is implicitly removed, if necessary. Added in version 3.4. base64.z85encode(s) Encode the *bytes-like object* *s* using Z85 (as used in ZeroMQ) and return the encoded "bytes". See Z85 specification for more information. Added in version 3.13. base64.z85decode(s) Decode the Z85-encoded *bytes-like object* or ASCII string *s* and return the decoded "bytes". See Z85 specification for more information. Added in version 3.13. Legacy Interface ================ base64.decode(input, output) Decode the contents of the binary *input* file and write the resulting binary data to the *output* file. *input* and *output* must be *file objects*. *input* will be read until "input.readline()" returns an empty bytes object. base64.decodebytes(s) Decode the *bytes-like object* *s*, which must contain one or more lines of base64 encoded data, and return the decoded "bytes". Added in version 3.1. base64.encode(input, output) Encode the contents of the binary *input* file and write the resulting base64 encoded data to the *output* file. *input* and *output* must be *file objects*. *input* will be read until "input.read()" returns an empty bytes object. "encode()" inserts a newline character ("b'\n'") after every 76 bytes of the output, as well as ensuring that the output always ends with a newline, as per **RFC 2045** (MIME). base64.encodebytes(s) Encode the *bytes-like object* *s*, which can contain arbitrary binary data, and return "bytes" containing the base64-encoded data, with newlines ("b'\n'") inserted after every 76 bytes of output, and ensuring that there is a trailing newline, as per **RFC 2045** (MIME). Added in version 3.1. An example usage of the module: >>> import base64 >>> encoded = base64.b64encode(b'data to be encoded') >>> encoded b'ZGF0YSB0byBiZSBlbmNvZGVk' >>> data = base64.b64decode(encoded) >>> data b'data to be encoded' Security Considerations ======================= A new security considerations section was added to **RFC 4648** (section 12); it’s recommended to review the security section for any code deployed to production. See also: Module "binascii" Support module containing ASCII-to-binary and binary-to-ASCII conversions. **RFC 1521** - MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies Section 5.2, “Base64 Content-Transfer-Encoding,” provides the definition of the base64 encoding. "bdb" — Debugger framework ************************** **Source code:** Lib/bdb.py ====================================================================== The "bdb" module handles basic debugger functions, like setting breakpoints or managing execution via the debugger. The following exception is defined: exception bdb.BdbQuit Exception raised by the "Bdb" class for quitting the debugger. The "bdb" module also defines two classes: class bdb.Breakpoint(self, file, line, temporary=False, cond=None, funcname=None) This class implements temporary breakpoints, ignore counts, disabling and (re-)enabling, and conditionals. Breakpoints are indexed by number through a list called "bpbynumber" and by "(file, line)" pairs through "bplist". The former points to a single instance of class "Breakpoint". The latter points to a list of such instances since there may be more than one breakpoint per line. When creating a breakpoint, its associated "file name" should be in canonical form. If a "funcname" is defined, a breakpoint "hit" will be counted when the first line of that function is executed. A "conditional" breakpoint always counts a "hit". "Breakpoint" instances have the following methods: deleteMe() Delete the breakpoint from the list associated to a file/line. If it is the last breakpoint in that position, it also deletes the entry for the file/line. enable() Mark the breakpoint as enabled. disable() Mark the breakpoint as disabled. bpformat() Return a string with all the information about the breakpoint, nicely formatted: * Breakpoint number. * Temporary status (del or keep). * File/line position. * Break condition. * Number of times to ignore. * Number of times hit. Added in version 3.2. bpprint(out=None) Print the output of "bpformat()" to the file *out*, or if it is "None", to standard output. "Breakpoint" instances have the following attributes: file File name of the "Breakpoint". line Line number of the "Breakpoint" within "file". temporary "True" if a "Breakpoint" at (file, line) is temporary. cond Condition for evaluating a "Breakpoint" at (file, line). funcname Function name that defines whether a "Breakpoint" is hit upon entering the function. enabled "True" if "Breakpoint" is enabled. bpbynumber Numeric index for a single instance of a "Breakpoint". bplist Dictionary of "Breakpoint" instances indexed by ("file", "line") tuples. ignore Number of times to ignore a "Breakpoint". hits Count of the number of times a "Breakpoint" has been hit. class bdb.Bdb(skip=None) The "Bdb" class acts as a generic Python debugger base class. This class takes care of the details of the trace facility; a derived class should implement user interaction. The standard debugger class ("pdb.Pdb") is an example. The *skip* argument, if given, must be an iterable of glob-style module name patterns. The debugger will not step into frames that originate in a module that matches one of these patterns. Whether a frame is considered to originate in a certain module is determined by the "__name__" in the frame globals. Changed in version 3.1: Added the *skip* parameter. The following methods of "Bdb" normally don’t need to be overridden. canonic(filename) Return canonical form of *filename*. For real file names, the canonical form is an operating-system- dependent, "case-normalized" "absolute path". A *filename* with angle brackets, such as """" generated in interactive mode, is returned unchanged. reset() Set the "botframe", "stopframe", "returnframe" and "quitting" attributes with values ready to start debugging. trace_dispatch(frame, event, arg) This function is installed as the trace function of debugged frames. Its return value is the new trace function (in most cases, that is, itself). The default implementation decides how to dispatch a frame, depending on the type of event (passed as a string) that is about to be executed. *event* can be one of the following: * ""line"": A new line of code is going to be executed. * ""call"": A function is about to be called, or another code block entered. * ""return"": A function or other code block is about to return. * ""exception"": An exception has occurred. * ""c_call"": A C function is about to be called. * ""c_return"": A C function has returned. * ""c_exception"": A C function has raised an exception. For the Python events, specialized functions (see below) are called. For the C events, no action is taken. The *arg* parameter depends on the previous event. See the documentation for "sys.settrace()" for more information on the trace function. For more information on code and frame objects, refer to The standard type hierarchy. dispatch_line(frame) If the debugger should stop on the current line, invoke the "user_line()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "quitting" flag is set (which can be set from "user_line()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. dispatch_call(frame, arg) If the debugger should stop on this function call, invoke the "user_call()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "quitting" flag is set (which can be set from "user_call()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. dispatch_return(frame, arg) If the debugger should stop on this function return, invoke the "user_return()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "quitting" flag is set (which can be set from "user_return()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. dispatch_exception(frame, arg) If the debugger should stop at this exception, invokes the "user_exception()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "quitting" flag is set (which can be set from "user_exception()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. Normally derived classes don’t override the following methods, but they may if they want to redefine the definition of stopping and breakpoints. is_skipped_line(module_name) Return "True" if *module_name* matches any skip pattern. stop_here(frame) Return "True" if *frame* is below the starting frame in the stack. break_here(frame) Return "True" if there is an effective breakpoint for this line. Check whether a line or function breakpoint exists and is in effect. Delete temporary breakpoints based on information from "effective()". break_anywhere(frame) Return "True" if any breakpoint exists for *frame*’s filename. Derived classes should override these methods to gain control over debugger operation. user_call(frame, argument_list) Called from "dispatch_call()" if a break might stop inside the called function. *argument_list* is not used anymore and will always be "None". The argument is kept for backwards compatibility. user_line(frame) Called from "dispatch_line()" when either "stop_here()" or "break_here()" returns "True". user_return(frame, return_value) Called from "dispatch_return()" when "stop_here()" returns "True". user_exception(frame, exc_info) Called from "dispatch_exception()" when "stop_here()" returns "True". do_clear(arg) Handle how a breakpoint must be removed when it is a temporary one. This method must be implemented by derived classes. Derived classes and clients can call the following methods to affect the stepping state. set_step() Stop after one line of code. set_next(frame) Stop on the next line in or below the given frame. set_return(frame) Stop when returning from the given frame. set_until(frame, lineno=None) Stop when the line with the *lineno* greater than the current one is reached or when returning from current frame. set_trace([frame]) Start debugging from *frame*. If *frame* is not specified, debugging starts from caller’s frame. Changed in version 3.13: "set_trace()" will enter the debugger immediately, rather than on the next line of code to be executed. set_continue() Stop only at breakpoints or when finished. If there are no breakpoints, set the system trace function to "None". set_quit() Set the "quitting" attribute to "True". This raises "BdbQuit" in the next call to one of the "dispatch_*()" methods. Derived classes and clients can call the following methods to manipulate breakpoints. These methods return a string containing an error message if something went wrong, or "None" if all is well. set_break(filename, lineno, temporary=False, cond=None, funcname=None) Set a new breakpoint. If the *lineno* line doesn’t exist for the *filename* passed as argument, return an error message. The *filename* should be in canonical form, as described in the "canonic()" method. clear_break(filename, lineno) Delete the breakpoints in *filename* and *lineno*. If none were set, return an error message. clear_bpbynumber(arg) Delete the breakpoint which has the index *arg* in the "Breakpoint.bpbynumber". If *arg* is not numeric or out of range, return an error message. clear_all_file_breaks(filename) Delete all breakpoints in *filename*. If none were set, return an error message. clear_all_breaks() Delete all existing breakpoints. If none were set, return an error message. get_bpbynumber(arg) Return a breakpoint specified by the given number. If *arg* is a string, it will be converted to a number. If *arg* is a non- numeric string, if the given breakpoint never existed or has been deleted, a "ValueError" is raised. Added in version 3.2. get_break(filename, lineno) Return "True" if there is a breakpoint for *lineno* in *filename*. get_breaks(filename, lineno) Return all breakpoints for *lineno* in *filename*, or an empty list if none are set. get_file_breaks(filename) Return all breakpoints in *filename*, or an empty list if none are set. get_all_breaks() Return all breakpoints that are set. Derived classes and clients can call the following methods to get a data structure representing a stack trace. get_stack(f, t) Return a list of (frame, lineno) tuples in a stack trace, and a size. The most recently called frame is last in the list. The size is the number of frames below the frame where the debugger was invoked. format_stack_entry(frame_lineno, lprefix=': ') Return a string with information about a stack entry, which is a "(frame, lineno)" tuple. The return string contains: * The canonical filename which contains the frame. * The function name or """". * The input arguments. * The return value. * The line of code (if it exists). The following two methods can be called by clients to use a debugger to debug a *statement*, given as a string. run(cmd, globals=None, locals=None) Debug a statement executed via the "exec()" function. *globals* defaults to "__main__.__dict__", *locals* defaults to *globals*. runeval(expr, globals=None, locals=None) Debug an expression executed via the "eval()" function. *globals* and *locals* have the same meaning as in "run()". runctx(cmd, globals, locals) For backwards compatibility. Calls the "run()" method. runcall(func, /, *args, **kwds) Debug a single function call, and return its result. Finally, the module defines the following functions: bdb.checkfuncname(b, frame) Return "True" if we should break here, depending on the way the "Breakpoint" *b* was set. If it was set via line number, it checks if "b.line" is the same as the one in *frame*. If the breakpoint was set via "function name", we have to check we are in the right *frame* (the right function) and if we are on its first executable line. bdb.effective(file, line, frame) Return "(active breakpoint, delete temporary flag)" or "(None, None)" as the breakpoint to act upon. The *active breakpoint* is the first entry in "bplist" for the ("file", "line") (which must exist) that is "enabled", for which "checkfuncname()" is true, and that has neither a false "condition" nor positive "ignore" count. The *flag*, meaning that a temporary breakpoint should be deleted, is "False" only when the "cond" cannot be evaluated (in which case, "ignore" count is ignored). If no such entry exists, then "(None, None)" is returned. bdb.set_trace() Start debugging with a "Bdb" instance from caller’s frame. Binary Data Services ******************** The modules described in this chapter provide some basic services operations for manipulation of binary data. Other operations on binary data, specifically in relation to file formats and network protocols, are described in the relevant sections. Some libraries described under Text Processing Services also work with either ASCII-compatible binary formats (for example, "re") or all binary data (for example, "difflib"). In addition, see the documentation for Python’s built-in binary data types in Binary Sequence Types — bytes, bytearray, memoryview. * "struct" — Interpret bytes as packed binary data * Functions and Exceptions * Format Strings * Byte Order, Size, and Alignment * Format Characters * Examples * Applications * Native Formats * Standard Formats * Classes * "codecs" — Codec registry and base classes * Codec Base Classes * Error Handlers * Stateless Encoding and Decoding * Incremental Encoding and Decoding * IncrementalEncoder Objects * IncrementalDecoder Objects * Stream Encoding and Decoding * StreamWriter Objects * StreamReader Objects * StreamReaderWriter Objects * StreamRecoder Objects * Encodings and Unicode * Standard Encodings * Python Specific Encodings * Text Encodings * Binary Transforms * Text Transforms * "encodings.idna" — Internationalized Domain Names in Applications * "encodings.mbcs" — Windows ANSI codepage * "encodings.utf_8_sig" — UTF-8 codec with BOM signature "binascii" — Convert between binary and ASCII ********************************************* ====================================================================== The "binascii" module contains a number of methods to convert between binary and various ASCII-encoded binary representations. Normally, you will not use these functions directly but use wrapper modules like "base64" instead. The "binascii" module contains low-level functions written in C for greater speed that are used by the higher-level modules. Note: "a2b_*" functions accept Unicode strings containing only ASCII characters. Other functions only accept *bytes-like objects* (such as "bytes", "bytearray" and other objects that support the buffer protocol). Changed in version 3.3: ASCII-only unicode strings are now accepted by the "a2b_*" functions. The "binascii" module defines the following functions: binascii.a2b_uu(string) Convert a single line of uuencoded data back to binary and return the binary data. Lines normally contain 45 (binary) bytes, except for the last line. Line data may be followed by whitespace. binascii.b2a_uu(data, *, backtick=False) Convert binary data to a line of ASCII characters, the return value is the converted line, including a newline char. The length of *data* should be at most 45. If *backtick* is true, zeros are represented by "'`'" instead of spaces. Changed in version 3.7: Added the *backtick* parameter. binascii.a2b_base64(string, /, *, strict_mode=False) Convert a block of base64 data back to binary and return the binary data. More than one line may be passed at a time. If *strict_mode* is true, only valid base64 data will be converted. Invalid base64 data will raise "binascii.Error". Valid base64: * Conforms to **RFC 3548**. * Contains only characters from the base64 alphabet. * Contains no excess data after padding (including excess padding, newlines, etc.). * Does not start with a padding. Changed in version 3.11: Added the *strict_mode* parameter. binascii.b2a_base64(data, *, newline=True) Convert binary data to a line of ASCII characters in base64 coding. The return value is the converted line, including a newline char if *newline* is true. The output of this function conforms to **RFC 3548**. Changed in version 3.6: Added the *newline* parameter. binascii.a2b_qp(data, header=False) Convert a block of quoted-printable data back to binary and return the binary data. More than one line may be passed at a time. If the optional argument *header* is present and true, underscores will be decoded as spaces. binascii.b2a_qp(data, quotetabs=False, istext=True, header=False) Convert binary data to a line(s) of ASCII characters in quoted- printable encoding. The return value is the converted line(s). If the optional argument *quotetabs* is present and true, all tabs and spaces will be encoded. If the optional argument *istext* is present and true, newlines are not encoded but trailing whitespace will be encoded. If the optional argument *header* is present and true, spaces will be encoded as underscores per **RFC 1522**. If the optional argument *header* is present and false, newline characters will be encoded as well; otherwise linefeed conversion might corrupt the binary data stream. binascii.crc_hqx(data, value) Compute a 16-bit CRC value of *data*, starting with *value* as the initial CRC, and return the result. This uses the CRC-CCITT polynomial *x*^16 + *x*^12 + *x*^5 + 1, often represented as 0x1021. This CRC is used in the binhex4 format. binascii.crc32(data[, value]) Compute CRC-32, the unsigned 32-bit checksum of *data*, starting with an initial CRC of *value*. The default initial CRC is zero. The algorithm is consistent with the ZIP file checksum. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Use as follows: print(binascii.crc32(b"hello world")) # Or, in two pieces: crc = binascii.crc32(b"hello") crc = binascii.crc32(b" world", crc) print('crc32 = {:#010x}'.format(crc)) Changed in version 3.0: The result is always unsigned. binascii.b2a_hex(data[, sep[, bytes_per_sep=1]]) binascii.hexlify(data[, sep[, bytes_per_sep=1]]) Return the hexadecimal representation of the binary *data*. Every byte of *data* is converted into the corresponding 2-digit hex representation. The returned bytes object is therefore twice as long as the length of *data*. Similar functionality (but returning a text string) is also conveniently accessible using the "bytes.hex()" method. If *sep* is specified, it must be a single character str or bytes object. It will be inserted in the output after every *bytes_per_sep* input bytes. Separator placement is counted from the right end of the output by default, if you wish to count from the left, supply a negative *bytes_per_sep* value. >>> import binascii >>> binascii.b2a_hex(b'\xb9\x01\xef') b'b901ef' >>> binascii.hexlify(b'\xb9\x01\xef', '-') b'b9-01-ef' >>> binascii.b2a_hex(b'\xb9\x01\xef', b'_', 2) b'b9_01ef' >>> binascii.b2a_hex(b'\xb9\x01\xef', b' ', -2) b'b901 ef' Changed in version 3.8: The *sep* and *bytes_per_sep* parameters were added. binascii.a2b_hex(hexstr) binascii.unhexlify(hexstr) Return the binary data represented by the hexadecimal string *hexstr*. This function is the inverse of "b2a_hex()". *hexstr* must contain an even number of hexadecimal digits (which can be upper or lower case), otherwise an "Error" exception is raised. Similar functionality (accepting only text string arguments, but more liberal towards whitespace) is also accessible using the "bytes.fromhex()" class method. exception binascii.Error Exception raised on errors. These are usually programming errors. exception binascii.Incomplete Exception raised on incomplete data. These are usually not programming errors, but may be handled by reading a little more data and trying again. See also: Module "base64" Support for RFC compliant base64-style encoding in base 16, 32, 64, and 85. Module "quopri" Support for quoted-printable encoding used in MIME email messages. "bisect" — Array bisection algorithm ************************************ **Source code:** Lib/bisect.py ====================================================================== This module provides support for maintaining a list in sorted order without having to sort the list after each insertion. For long lists of items with expensive comparison operations, this can be an improvement over linear searches or frequent resorting. The module is called "bisect" because it uses a basic bisection algorithm to do its work. Unlike other bisection tools that search for a specific value, the functions in this module are designed to locate an insertion point. Accordingly, the functions never call an "__eq__()" method to determine whether a value has been found. Instead, the functions only call the "__lt__()" method and will return an insertion point between values in an array. The following functions are provided: bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None) Locate the insertion point for *x* in *a* to maintain sorted order. The parameters *lo* and *hi* may be used to specify a subset of the list which should be considered; by default the entire list is used. If *x* is already present in *a*, the insertion point will be before (to the left of) any existing entries. The return value is suitable for use as the first parameter to "list.insert()" assuming that *a* is already sorted. The returned insertion point *ip* partitions the array *a* into two slices such that "all(elem < x for elem in a[lo : ip])" is true for the left slice and "all(elem >= x for elem in a[ip : hi])" is true for the right slice. *key* specifies a *key function* of one argument that is used to extract a comparison key from each element in the array. To support searching complex records, the key function is not applied to the *x* value. If *key* is "None", the elements are compared directly and no key function is called. Changed in version 3.10: Added the *key* parameter. bisect.bisect_right(a, x, lo=0, hi=len(a), *, key=None) bisect.bisect(a, x, lo=0, hi=len(a), *, key=None) Similar to "bisect_left()", but returns an insertion point which comes after (to the right of) any existing entries of *x* in *a*. The returned insertion point *ip* partitions the array *a* into two slices such that "all(elem <= x for elem in a[lo : ip])" is true for the left slice and "all(elem > x for elem in a[ip : hi])" is true for the right slice. Changed in version 3.10: Added the *key* parameter. bisect.insort_left(a, x, lo=0, hi=len(a), *, key=None) Insert *x* in *a* in sorted order. This function first runs "bisect_left()" to locate an insertion point. Next, it runs the "insert()" method on *a* to insert *x* at the appropriate position to maintain sort order. To support inserting records in a table, the *key* function (if any) is applied to *x* for the search step but not for the insertion step. Keep in mind that the *O*(log *n*) search is dominated by the slow *O*(*n*) insertion step. Changed in version 3.10: Added the *key* parameter. bisect.insort_right(a, x, lo=0, hi=len(a), *, key=None) bisect.insort(a, x, lo=0, hi=len(a), *, key=None) Similar to "insort_left()", but inserting *x* in *a* after any existing entries of *x*. This function first runs "bisect_right()" to locate an insertion point. Next, it runs the "insert()" method on *a* to insert *x* at the appropriate position to maintain sort order. To support inserting records in a table, the *key* function (if any) is applied to *x* for the search step but not for the insertion step. Keep in mind that the *O*(log *n*) search is dominated by the slow *O*(*n*) insertion step. Changed in version 3.10: Added the *key* parameter. Performance Notes ================= When writing time sensitive code using *bisect()* and *insort()*, keep these thoughts in mind: * Bisection is effective for searching ranges of values. For locating specific values, dictionaries are more performant. * The *insort()* functions are *O*(*n*) because the logarithmic search step is dominated by the linear time insertion step. * The search functions are stateless and discard key function results after they are used. Consequently, if the search functions are used in a loop, the key function may be called again and again on the same array elements. If the key function isn’t fast, consider wrapping it with "functools.cache()" to avoid duplicate computations. Alternatively, consider searching an array of precomputed keys to locate the insertion point (as shown in the examples section below). See also: * Sorted Collections is a high performance module that uses *bisect* to managed sorted collections of data. * The SortedCollection recipe uses bisect to build a full-featured collection class with straight-forward search methods and support for a key-function. The keys are precomputed to save unnecessary calls to the key function during searches. Searching Sorted Lists ====================== The above bisect functions are useful for finding insertion points but can be tricky or awkward to use for common searching tasks. The following five functions show how to transform them into the standard lookups for sorted lists: def index(a, x): 'Locate the leftmost value exactly equal to x' i = bisect_left(a, x) if i != len(a) and a[i] == x: return i raise ValueError def find_lt(a, x): 'Find rightmost value less than x' i = bisect_left(a, x) if i: return a[i-1] raise ValueError def find_le(a, x): 'Find rightmost value less than or equal to x' i = bisect_right(a, x) if i: return a[i-1] raise ValueError def find_gt(a, x): 'Find leftmost value greater than x' i = bisect_right(a, x) if i != len(a): return a[i] raise ValueError def find_ge(a, x): 'Find leftmost item greater than or equal to x' i = bisect_left(a, x) if i != len(a): return a[i] raise ValueError Examples ======== The "bisect()" function can be useful for numeric table lookups. This example uses "bisect()" to look up a letter grade for an exam score (say) based on a set of ordered numeric breakpoints: 90 and up is an ‘A’, 80 to 89 is a ‘B’, and so on: >>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'): ... i = bisect(breakpoints, score) ... return grades[i] ... >>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]] ['F', 'A', 'C', 'C', 'B', 'A', 'A'] The "bisect()" and "insort()" functions also work with lists of tuples. The *key* argument can serve to extract the field used for ordering records in a table: >>> from collections import namedtuple >>> from operator import attrgetter >>> from bisect import bisect, insort >>> from pprint import pprint >>> Movie = namedtuple('Movie', ('name', 'released', 'director')) >>> movies = [ ... Movie('Jaws', 1975, 'Spielberg'), ... Movie('Titanic', 1997, 'Cameron'), ... Movie('The Birds', 1963, 'Hitchcock'), ... Movie('Aliens', 1986, 'Cameron') ... ] >>> # Find the first movie released after 1960 >>> by_year = attrgetter('released') >>> movies.sort(key=by_year) >>> movies[bisect(movies, 1960, key=by_year)] Movie(name='The Birds', released=1963, director='Hitchcock') >>> # Insert a movie while maintaining sort order >>> romance = Movie('Love Story', 1970, 'Hiller') >>> insort(movies, romance, key=by_year) >>> pprint(movies) [Movie(name='The Birds', released=1963, director='Hitchcock'), Movie(name='Love Story', released=1970, director='Hiller'), Movie(name='Jaws', released=1975, director='Spielberg'), Movie(name='Aliens', released=1986, director='Cameron'), Movie(name='Titanic', released=1997, director='Cameron')] If the key function is expensive, it is possible to avoid repeated function calls by searching a list of precomputed keys to find the index of a record: >>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)] >>> data.sort(key=lambda r: r[1]) # Or use operator.itemgetter(1). >>> keys = [r[1] for r in data] # Precompute a list of keys. >>> data[bisect_left(keys, 0)] ('black', 0) >>> data[bisect_left(keys, 1)] ('blue', 1) >>> data[bisect_left(keys, 5)] ('red', 5) >>> data[bisect_left(keys, 8)] ('yellow', 8) "builtins" — Built-in objects ***************************** ====================================================================== This module provides direct access to all ‘built-in’ identifiers of Python; for example, "builtins.open" is the full name for the built-in function "open()". This module is not normally accessed explicitly by most applications, but can be useful in modules that provide objects with the same name as a built-in value, but in which the built-in of that name is also needed. For example, in a module that wants to implement an "open()" function that wraps the built-in "open()", this module can be used directly: import builtins def open(path): f = builtins.open(path, 'r') return UpperCaser(f) class UpperCaser: '''Wrapper around a file that converts output to uppercase.''' def __init__(self, f): self._f = f def read(self, count=-1): return self._f.read(count).upper() # ... As an implementation detail, most modules have the name "__builtins__" made available as part of their globals. The value of "__builtins__" is normally either this module or the value of this module’s "__dict__" attribute. Since this is an implementation detail, it may not be used by alternate implementations of Python. See also: * Built-in Constants * Built-in Exceptions * Built-in Functions * Built-in Types "bz2" — Support for **bzip2** compression ***************************************** **Source code:** Lib/bz2.py ====================================================================== This module provides a comprehensive interface for compressing and decompressing data using the bzip2 compression algorithm. The "bz2" module contains: * The "open()" function and "BZ2File" class for reading and writing compressed files. * The "BZ2Compressor" and "BZ2Decompressor" classes for incremental (de)compression. * The "compress()" and "decompress()" functions for one-shot (de)compression. (De)compression of files ======================== bz2.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None) Open a bzip2-compressed file in binary or text mode, returning a *file object*. As with the constructor for "BZ2File", the *filename* argument can be an actual filename (a "str" or "bytes" object), or an existing file object to read from or write to. The *mode* argument can be any of "'r'", "'rb'", "'w'", "'wb'", "'x'", "'xb'", "'a'" or "'ab'" for binary mode, or "'rt'", "'wt'", "'xt'", or "'at'" for text mode. The default is "'rb'". The *compresslevel* argument is an integer from 1 to 9, as for the "BZ2File" constructor. For binary mode, this function is equivalent to the "BZ2File" constructor: "BZ2File(filename, mode, compresslevel=compresslevel)". In this case, the *encoding*, *errors* and *newline* arguments must not be provided. For text mode, a "BZ2File" object is created, and wrapped in an "io.TextIOWrapper" instance with the specified encoding, error handling behavior, and line ending(s). Added in version 3.3. Changed in version 3.4: The "'x'" (exclusive creation) mode was added. Changed in version 3.6: Accepts a *path-like object*. class bz2.BZ2File(filename, mode='r', *, compresslevel=9) Open a bzip2-compressed file in binary mode. If *filename* is a "str" or "bytes" object, open the named file directly. Otherwise, *filename* should be a *file object*, which will be used to read or write the compressed data. The *mode* argument can be either "'r'" for reading (default), "'w'" for overwriting, "'x'" for exclusive creation, or "'a'" for appending. These can equivalently be given as "'rb'", "'wb'", "'xb'" and "'ab'" respectively. If *filename* is a file object (rather than an actual file name), a mode of "'w'" does not truncate the file, and is instead equivalent to "'a'". If *mode* is "'w'" or "'a'", *compresslevel* can be an integer between "1" and "9" specifying the level of compression: "1" produces the least compression, and "9" (default) produces the most compression. If *mode* is "'r'", the input file may be the concatenation of multiple compressed streams. "BZ2File" provides all of the members specified by the "io.BufferedIOBase", except for "detach()" and "truncate()". Iteration and the "with" statement are supported. "BZ2File" also provides the following methods and attributes: peek([n]) Return buffered data without advancing the file position. At least one byte of data will be returned (unless at EOF). The exact number of bytes returned is unspecified. Note: While calling "peek()" does not change the file position of the "BZ2File", it may change the position of the underlying file object (e.g. if the "BZ2File" was constructed by passing a file object for *filename*). Added in version 3.3. fileno() Return the file descriptor for the underlying file. Added in version 3.3. readable() Return whether the file was opened for reading. Added in version 3.3. seekable() Return whether the file supports seeking. Added in version 3.3. writable() Return whether the file was opened for writing. Added in version 3.3. read1(size=-1) Read up to *size* uncompressed bytes, while trying to avoid making multiple reads from the underlying stream. Reads up to a buffer’s worth of data if size is negative. Returns "b''" if the file is at EOF. Added in version 3.3. readinto(b) Read bytes into *b*. Returns the number of bytes read (0 for EOF). Added in version 3.3. mode "'rb'" for reading and "'wb'" for writing. Added in version 3.13. name The bzip2 file name. Equivalent to the "name" attribute of the underlying *file object*. Added in version 3.13. Changed in version 3.1: Support for the "with" statement was added. Changed in version 3.3: Support was added for *filename* being a *file object* instead of an actual filename.The "'a'" (append) mode was added, along with support for reading multi-stream files. Changed in version 3.4: The "'x'" (exclusive creation) mode was added. Changed in version 3.5: The "read()" method now accepts an argument of "None". Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.9: The *buffering* parameter has been removed. It was ignored and deprecated since Python 3.0. Pass an open file object to control how the file is opened.The *compresslevel* parameter became keyword-only. Changed in version 3.10: This class is thread unsafe in the face of multiple simultaneous readers or writers, just like its equivalent classes in "gzip" and "lzma" have always been. Incremental (de)compression =========================== class bz2.BZ2Compressor(compresslevel=9) Create a new compressor object. This object may be used to compress data incrementally. For one-shot compression, use the "compress()" function instead. *compresslevel*, if given, must be an integer between "1" and "9". The default is "9". compress(data) Provide data to the compressor object. Returns a chunk of compressed data if possible, or an empty byte string otherwise. When you have finished providing data to the compressor, call the "flush()" method to finish the compression process. flush() Finish the compression process. Returns the compressed data left in internal buffers. The compressor object may not be used after this method has been called. class bz2.BZ2Decompressor Create a new decompressor object. This object may be used to decompress data incrementally. For one-shot compression, use the "decompress()" function instead. Note: This class does not transparently handle inputs containing multiple compressed streams, unlike "decompress()" and "BZ2File". If you need to decompress a multi-stream input with "BZ2Decompressor", you must use a new decompressor for each stream. decompress(data, max_length=-1) Decompress *data* (a *bytes-like object*), returning uncompressed data as bytes. Some of *data* may be buffered internally, for use in later calls to "decompress()". The returned data should be concatenated with the output of any previous calls to "decompress()". If *max_length* is nonnegative, returns at most *max_length* bytes of decompressed data. If this limit is reached and further output can be produced, the "needs_input" attribute will be set to "False". In this case, the next call to "decompress()" may provide *data* as "b''" to obtain more of the output. If all of the input data was decompressed and returned (either because this was less than *max_length* bytes, or because *max_length* was negative), the "needs_input" attribute will be set to "True". Attempting to decompress data after the end of stream is reached raises an "EOFError". Any data found after the end of the stream is ignored and saved in the "unused_data" attribute. Changed in version 3.5: Added the *max_length* parameter. eof "True" if the end-of-stream marker has been reached. Added in version 3.3. unused_data Data found after the end of the compressed stream. If this attribute is accessed before the end of the stream has been reached, its value will be "b''". needs_input "False" if the "decompress()" method can provide more decompressed data before requiring new uncompressed input. Added in version 3.5. One-shot (de)compression ======================== bz2.compress(data, compresslevel=9) Compress *data*, a *bytes-like object*. *compresslevel*, if given, must be an integer between "1" and "9". The default is "9". For incremental compression, use a "BZ2Compressor" instead. bz2.decompress(data) Decompress *data*, a *bytes-like object*. If *data* is the concatenation of multiple compressed streams, decompress all of the streams. For incremental decompression, use a "BZ2Decompressor" instead. Changed in version 3.3: Support for multi-stream inputs was added. Examples of usage ================= Below are some examples of typical usage of the "bz2" module. Using "compress()" and "decompress()" to demonstrate round-trip compression: >>> import bz2 >>> data = b"""\ ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" >>> c = bz2.compress(data) >>> len(data) / len(c) # Data compression ratio 1.513595166163142 >>> d = bz2.decompress(c) >>> data == d # Check equality to original object after round-trip True Using "BZ2Compressor" for incremental compression: >>> import bz2 >>> def gen_data(chunks=10, chunksize=1000): ... """Yield incremental blocks of chunksize bytes.""" ... for _ in range(chunks): ... yield b"z" * chunksize ... >>> comp = bz2.BZ2Compressor() >>> out = b"" >>> for chunk in gen_data(): ... # Provide data to the compressor object ... out = out + comp.compress(chunk) ... >>> # Finish the compression process. Call this once you have >>> # finished providing data to the compressor. >>> out = out + comp.flush() The example above uses a very “nonrandom” stream of data (a stream of "b"z"" chunks). Random data tends to compress poorly, while ordered, repetitive data usually yields a high compression ratio. Writing and reading a bzip2-compressed file in binary mode: >>> import bz2 >>> data = b"""\ ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" >>> with bz2.open("myfile.bz2", "wb") as f: ... # Write compressed data to file ... unused = f.write(data) ... >>> with bz2.open("myfile.bz2", "rb") as f: ... # Decompress data from file ... content = f.read() ... >>> content == data # Check equality to original object after round-trip True "calendar" — General calendar-related functions *********************************************** **Source code:** Lib/calendar.py ====================================================================== This module allows you to output calendars like the Unix **cal** program, and provides additional useful functions related to the calendar. By default, these calendars have Monday as the first day of the week, and Sunday as the last (the European convention). Use "setfirstweekday()" to set the first day of the week to Sunday (6) or to any other weekday. Parameters that specify dates are given as integers. For related functionality, see also the "datetime" and "time" modules. The functions and classes defined in this module use an idealized calendar, the current Gregorian calendar extended indefinitely in both directions. This matches the definition of the “proleptic Gregorian” calendar in Dershowitz and Reingold’s book “Calendrical Calculations”, where it’s the base calendar for all computations. Zero and negative years are interpreted as prescribed by the ISO 8601 standard. Year 0 is 1 BC, year -1 is 2 BC, and so on. class calendar.Calendar(firstweekday=0) Creates a "Calendar" object. *firstweekday* is an integer specifying the first day of the week. "MONDAY" is "0" (the default), "SUNDAY" is "6". A "Calendar" object provides several methods that can be used for preparing the calendar data for formatting. This class doesn’t do any formatting itself. This is the job of subclasses. "Calendar" instances have the following methods and attributes: firstweekday The first weekday as an integer (0–6). This property can also be set and read using "setfirstweekday()" and "getfirstweekday()" respectively. getfirstweekday() Return an "int" for the current first weekday (0–6). Identical to reading the "firstweekday" property. setfirstweekday(firstweekday) Set the first weekday to *firstweekday*, passed as an "int" (0–6) Identical to setting the "firstweekday" property. iterweekdays() Return an iterator for the week day numbers that will be used for one week. The first value from the iterator will be the same as the value of the "firstweekday" property. itermonthdates(year, month) Return an iterator for the month *month* (1–12) in the year *year*. This iterator will return all days (as "datetime.date" objects) for the month and all days before the start of the month or after the end of the month that are required to get a complete week. itermonthdays(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will simply be day of the month numbers. For the days outside of the specified month, the day number is "0". itermonthdays2(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will be tuples consisting of a day of the month number and a week day number. itermonthdays3(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will be tuples consisting of a year, a month and a day of the month numbers. Added in version 3.7. itermonthdays4(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will be tuples consisting of a year, a month, a day of the month, and a day of the week numbers. Added in version 3.7. monthdatescalendar(year, month) Return a list of the weeks in the month *month* of the *year* as full weeks. Weeks are lists of seven "datetime.date" objects. monthdays2calendar(year, month) Return a list of the weeks in the month *month* of the *year* as full weeks. Weeks are lists of seven tuples of day numbers and weekday numbers. monthdayscalendar(year, month) Return a list of the weeks in the month *month* of the *year* as full weeks. Weeks are lists of seven day numbers. yeardatescalendar(year, width=3) Return the data for the specified year ready for formatting. The return value is a list of month rows. Each month row contains up to *width* months (defaulting to 3). Each month contains between 4 and 6 weeks and each week contains 1–7 days. Days are "datetime.date" objects. yeardays2calendar(year, width=3) Return the data for the specified year ready for formatting (similar to "yeardatescalendar()"). Entries in the week lists are tuples of day numbers and weekday numbers. Day numbers outside this month are zero. yeardayscalendar(year, width=3) Return the data for the specified year ready for formatting (similar to "yeardatescalendar()"). Entries in the week lists are day numbers. Day numbers outside this month are zero. class calendar.TextCalendar(firstweekday=0) This class can be used to generate plain text calendars. "TextCalendar" instances have the following methods: formatday(theday, weekday, width) Return a string representing a single day formatted with the given *width*. If *theday* is "0", return a string of spaces of the specified width, representing an empty day. The *weekday* parameter is unused. formatweek(theweek, w=0) Return a single week in a string with no newline. If *w* is provided, it specifies the width of the date columns, which are centered. Depends on the first weekday as specified in the constructor or set by the "setfirstweekday()" method. formatweekday(weekday, width) Return a string representing the name of a single weekday formatted to the specified *width*. The *weekday* parameter is an integer representing the day of the week, where "0" is Monday and "6" is Sunday. formatweekheader(width) Return a string containing the header row of weekday names, formatted with the given *width* for each column. The names depend on the locale settings and are padded to the specified width. formatmonth(theyear, themonth, w=0, l=0) Return a month’s calendar in a multi-line string. If *w* is provided, it specifies the width of the date columns, which are centered. If *l* is given, it specifies the number of lines that each week will use. Depends on the first weekday as specified in the constructor or set by the "setfirstweekday()" method. formatmonthname(theyear, themonth, width=0, withyear=True) Return a string representing the month’s name centered within the specified *width*. If *withyear* is "True", include the year in the output. The *theyear* and *themonth* parameters specify the year and month for the name to be formatted respectively. prmonth(theyear, themonth, w=0, l=0) Print a month’s calendar as returned by "formatmonth()". formatyear(theyear, w=2, l=1, c=6, m=3) Return a *m*-column calendar for an entire year as a multi-line string. Optional parameters *w*, *l*, and *c* are for date column width, lines per week, and number of spaces between month columns, respectively. Depends on the first weekday as specified in the constructor or set by the "setfirstweekday()" method. The earliest year for which a calendar can be generated is platform-dependent. pryear(theyear, w=2, l=1, c=6, m=3) Print the calendar for an entire year as returned by "formatyear()". class calendar.HTMLCalendar(firstweekday=0) This class can be used to generate HTML calendars. "HTMLCalendar" instances have the following methods: formatmonth(theyear, themonth, withyear=True) Return a month’s calendar as an HTML table. If *withyear* is true the year will be included in the header, otherwise just the month name will be used. formatyear(theyear, width=3) Return a year’s calendar as an HTML table. *width* (defaulting to 3) specifies the number of months per row. formatyearpage(theyear, width=3, css='calendar.css', encoding=None) Return a year’s calendar as a complete HTML page. *width* (defaulting to 3) specifies the number of months per row. *css* is the name for the cascading style sheet to be used. "None" can be passed if no style sheet should be used. *encoding* specifies the encoding to be used for the output (defaulting to the system default encoding). formatmonthname(theyear, themonth, withyear=True) Return a month name as an HTML table row. If *withyear* is true the year will be included in the row, otherwise just the month name will be used. "HTMLCalendar" has the following attributes you can override to customize the CSS classes used by the calendar: cssclasses A list of CSS classes used for each weekday. The default class list is: cssclasses = ["mon", "tue", "wed", "thu", "fri", "sat", "sun"] more styles can be added for each day: cssclasses = ["mon text-bold", "tue", "wed", "thu", "fri", "sat", "sun red"] Note that the length of this list must be seven items. cssclass_noday The CSS class for a weekday occurring in the previous or coming month. Added in version 3.7. cssclasses_weekday_head A list of CSS classes used for weekday names in the header row. The default is the same as "cssclasses". Added in version 3.7. cssclass_month_head The month’s head CSS class (used by "formatmonthname()"). The default value is ""month"". Added in version 3.7. cssclass_month The CSS class for the whole month’s table (used by "formatmonth()"). The default value is ""month"". Added in version 3.7. cssclass_year The CSS class for the whole year’s table of tables (used by "formatyear()"). The default value is ""year"". Added in version 3.7. cssclass_year_head The CSS class for the table head for the whole year (used by "formatyear()"). The default value is ""year"". Added in version 3.7. Note that although the naming for the above described class attributes is singular (e.g. "cssclass_month" "cssclass_noday"), one can replace the single CSS class with a space separated list of CSS classes, for example: "text-bold text-red" Here is an example how "HTMLCalendar" can be customized: class CustomHTMLCal(calendar.HTMLCalendar): cssclasses = [style + " text-nowrap" for style in calendar.HTMLCalendar.cssclasses] cssclass_month_head = "text-center month-head" cssclass_month = "text-center month" cssclass_year = "text-italic lead" class calendar.LocaleTextCalendar(firstweekday=0, locale=None) This subclass of "TextCalendar" can be passed a locale name in the constructor and will return month and weekday names in the specified locale. class calendar.LocaleHTMLCalendar(firstweekday=0, locale=None) This subclass of "HTMLCalendar" can be passed a locale name in the constructor and will return month and weekday names in the specified locale. Note: The constructor, "formatweekday()" and "formatmonthname()" methods of these two classes temporarily change the "LC_TIME" locale to the given *locale*. Because the current locale is a process-wide setting, they are not thread-safe. For simple text calendars this module provides the following functions. calendar.setfirstweekday(weekday) Sets the weekday ("0" is Monday, "6" is Sunday) to start each week. The values "MONDAY", "TUESDAY", "WEDNESDAY", "THURSDAY", "FRIDAY", "SATURDAY", and "SUNDAY" are provided for convenience. For example, to set the first weekday to Sunday: import calendar calendar.setfirstweekday(calendar.SUNDAY) calendar.firstweekday() Returns the current setting for the weekday to start each week. calendar.isleap(year) Returns "True" if *year* is a leap year, otherwise "False". calendar.leapdays(y1, y2) Returns the number of leap years in the range from *y1* to *y2* (exclusive), where *y1* and *y2* are years. This function works for ranges spanning a century change. calendar.weekday(year, month, day) Returns the day of the week ("0" is Monday) for *year* ("1970"–…), *month* ("1"–"12"), *day* ("1"–"31"). calendar.weekheader(n) Return a header containing abbreviated weekday names. *n* specifies the width in characters for one weekday. calendar.monthrange(year, month) Returns weekday of first day of the month and number of days in month, for the specified *year* and *month*. calendar.monthcalendar(year, month) Returns a matrix representing a month’s calendar. Each row represents a week; days outside of the month are represented by zeros. Each week begins with Monday unless set by "setfirstweekday()". calendar.prmonth(theyear, themonth, w=0, l=0) Prints a month’s calendar as returned by "month()". calendar.month(theyear, themonth, w=0, l=0) Returns a month’s calendar in a multi-line string using the "formatmonth()" of the "TextCalendar" class. calendar.prcal(year, w=0, l=0, c=6, m=3) Prints the calendar for an entire year as returned by "calendar()". calendar.calendar(year, w=2, l=1, c=6, m=3) Returns a 3-column calendar for an entire year as a multi-line string using the "formatyear()" of the "TextCalendar" class. calendar.timegm(tuple) An unrelated but handy function that takes a time tuple such as returned by the "gmtime()" function in the "time" module, and returns the corresponding Unix timestamp value, assuming an epoch of 1970, and the POSIX encoding. In fact, "time.gmtime()" and "timegm()" are each others’ inverse. The "calendar" module exports the following data attributes: calendar.day_name A sequence that represents the days of the week in the current locale, where Monday is day number 0. >>> import calendar >>> list(calendar.day_name) ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] calendar.day_abbr A sequence that represents the abbreviated days of the week in the current locale, where Mon is day number 0. >>> import calendar >>> list(calendar.day_abbr) ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] calendar.MONDAY calendar.TUESDAY calendar.WEDNESDAY calendar.THURSDAY calendar.FRIDAY calendar.SATURDAY calendar.SUNDAY Aliases for the days of the week, where "MONDAY" is "0" and "SUNDAY" is "6". Added in version 3.12. class calendar.Day Enumeration defining days of the week as integer constants. The members of this enumeration are exported to the module scope as "MONDAY" through "SUNDAY". Added in version 3.12. calendar.month_name A sequence that represents the months of the year in the current locale. This follows normal convention of January being month number 1, so it has a length of 13 and "month_name[0]" is the empty string. >>> import calendar >>> list(calendar.month_name) ['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] calendar.month_abbr A sequence that represents the abbreviated months of the year in the current locale. This follows normal convention of January being month number 1, so it has a length of 13 and "month_abbr[0]" is the empty string. >>> import calendar >>> list(calendar.month_abbr) ['', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] calendar.JANUARY calendar.FEBRUARY calendar.MARCH calendar.APRIL calendar.MAY calendar.JUNE calendar.JULY calendar.AUGUST calendar.SEPTEMBER calendar.OCTOBER calendar.NOVEMBER calendar.DECEMBER Aliases for the months of the year, where "JANUARY" is "1" and "DECEMBER" is "12". Added in version 3.12. class calendar.Month Enumeration defining months of the year as integer constants. The members of this enumeration are exported to the module scope as "JANUARY" through "DECEMBER". Added in version 3.12. The "calendar" module defines the following exceptions: exception calendar.IllegalMonthError(month) A subclass of "ValueError", raised when the given month number is outside of the range 1-12 (inclusive). month The invalid month number. exception calendar.IllegalWeekdayError(weekday) A subclass of "ValueError", raised when the given weekday number is outside of the range 0-6 (inclusive). weekday The invalid weekday number. See also: Module "datetime" Object-oriented interface to dates and times with similar functionality to the "time" module. Module "time" Low-level time related functions. Command-Line Usage ================== Added in version 2.5. The "calendar" module can be executed as a script from the command line to interactively print a calendar. python -m calendar [-h] [-L LOCALE] [-e ENCODING] [-t {text,html}] [-w WIDTH] [-l LINES] [-s SPACING] [-m MONTHS] [-c CSS] [-f FIRST_WEEKDAY] [year] [month] For example, to print a calendar for the year 2000: $ python -m calendar 2000 2000 January February March Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 1 2 3 4 5 6 1 2 3 4 5 3 4 5 6 7 8 9 7 8 9 10 11 12 13 6 7 8 9 10 11 12 10 11 12 13 14 15 16 14 15 16 17 18 19 20 13 14 15 16 17 18 19 17 18 19 20 21 22 23 21 22 23 24 25 26 27 20 21 22 23 24 25 26 24 25 26 27 28 29 30 28 29 27 28 29 30 31 31 April May June Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 1 2 3 4 5 6 7 1 2 3 4 3 4 5 6 7 8 9 8 9 10 11 12 13 14 5 6 7 8 9 10 11 10 11 12 13 14 15 16 15 16 17 18 19 20 21 12 13 14 15 16 17 18 17 18 19 20 21 22 23 22 23 24 25 26 27 28 19 20 21 22 23 24 25 24 25 26 27 28 29 30 29 30 31 26 27 28 29 30 July August September Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 2 1 2 3 4 5 6 1 2 3 3 4 5 6 7 8 9 7 8 9 10 11 12 13 4 5 6 7 8 9 10 10 11 12 13 14 15 16 14 15 16 17 18 19 20 11 12 13 14 15 16 17 17 18 19 20 21 22 23 21 22 23 24 25 26 27 18 19 20 21 22 23 24 24 25 26 27 28 29 30 28 29 30 31 25 26 27 28 29 30 31 October November December Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su 1 1 2 3 4 5 1 2 3 2 3 4 5 6 7 8 6 7 8 9 10 11 12 4 5 6 7 8 9 10 9 10 11 12 13 14 15 13 14 15 16 17 18 19 11 12 13 14 15 16 17 16 17 18 19 20 21 22 20 21 22 23 24 25 26 18 19 20 21 22 23 24 23 24 25 26 27 28 29 27 28 29 30 25 26 27 28 29 30 31 30 31 The following options are accepted: --help, -h Show the help message and exit. --locale LOCALE, -L LOCALE The locale to use for month and weekday names. Defaults to English. --encoding ENCODING, -e ENCODING The encoding to use for output. "--encoding" is required if "-- locale" is set. --type {text,html}, -t {text,html} Print the calendar to the terminal as text, or as an HTML document. --first-weekday FIRST_WEEKDAY, -f FIRST_WEEKDAY The weekday to start each week. Must be a number between 0 (Monday) and 6 (Sunday). Defaults to 0. Added in version 3.13. year The year to print the calendar for. Defaults to the current year. month The month of the specified "year" to print the calendar for. Must be a number between 1 and 12, and may only be used in text mode. Defaults to printing a calendar for the full year. *Text-mode options:* --width WIDTH, -w WIDTH The width of the date column in terminal columns. The date is printed centred in the column. Any value lower than 2 is ignored. Defaults to 2. --lines LINES, -l LINES The number of lines for each week in terminal rows. The date is printed top-aligned. Any value lower than 1 is ignored. Defaults to 1. --spacing SPACING, -s SPACING The space between months in columns. Any value lower than 2 is ignored. Defaults to 6. --months MONTHS, -m MONTHS The number of months printed per row. Defaults to 3. *HTML-mode options:* --css CSS, -c CSS The path of a CSS stylesheet to use for the calendar. This must either be relative to the generated HTML, or an absolute HTTP or "file:///" URL. "cgi" — Common Gateway Interface support **************************************** Deprecated since version 3.11, removed in version 3.13. This module is no longer part of the Python standard library. It was removed in Python 3.13 after being deprecated in Python 3.11. The removal was decided in **PEP 594**. A fork of the module on PyPI can be used instead: legacy-cgi. This is a copy of the cgi module, no longer maintained or supported by the core Python team. The last version of Python that provided the "cgi" module was Python 3.12. "cgitb" — Traceback manager for CGI scripts ******************************************* Deprecated since version 3.11, removed in version 3.13. This module is no longer part of the Python standard library. It was removed in Python 3.13 after being deprecated in Python 3.11. The removal was decided in **PEP 594**. A fork of the module on PyPI can now be used instead: legacy-cgi. This is a copy of the cgi module, no longer maintained or supported by the core Python team. The last version of Python that provided the "cgitb" module was Python 3.12. "chunk" — Read IFF chunked data ******************************* Deprecated since version 3.11, removed in version 3.13. This module is no longer part of the Python standard library. It was removed in Python 3.13 after being deprecated in Python 3.11. The removal was decided in **PEP 594**. The last version of Python that provided the "chunk" module was Python 3.12. "cmath" — Mathematical functions for complex numbers **************************************************** ====================================================================== This module provides access to mathematical functions for complex numbers. The functions in this module accept integers, floating-point numbers or complex numbers as arguments. They will also accept any Python object that has either a "__complex__()" or a "__float__()" method: these methods are used to convert the object to a complex or floating-point number, respectively, and the function is then applied to the result of the conversion. Note: For functions involving branch cuts, we have the problem of deciding how to define those functions on the cut itself. Following Kahan’s “Branch cuts for complex elementary functions” paper, as well as Annex G of C99 and later C standards, we use the sign of zero to distinguish one side of the branch cut from the other: for a branch cut along (a portion of) the real axis we look at the sign of the imaginary part, while for a branch cut along the imaginary axis we look at the sign of the real part.For example, the "cmath.sqrt()" function has a branch cut along the negative real axis. An argument of "complex(-2.0, -0.0)" is treated as though it lies *below* the branch cut, and so gives a result on the negative imaginary axis: >>> cmath.sqrt(complex(-2.0, -0.0)) -1.4142135623730951j But an argument of "complex(-2.0, 0.0)" is treated as though it lies above the branch cut: >>> cmath.sqrt(complex(-2.0, 0.0)) 1.4142135623730951j +------------------------------------------------------+--------------------------------------------------------------------+ | **Conversions to and from polar coordinates** | +------------------------------------------------------+--------------------------------------------------------------------+ | "phase(z)" | Return the phase of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "polar(z)" | Return the representation of *z* in polar coordinates | +------------------------------------------------------+--------------------------------------------------------------------+ | "rect(r, phi)" | Return the complex number *z* with polar coordinates *r* and *phi* | +------------------------------------------------------+--------------------------------------------------------------------+ | **Power and logarithmic functions** | +------------------------------------------------------+--------------------------------------------------------------------+ | "exp(z)" | Return *e* raised to the power *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "log(z[, base])" | Return the logarithm of *z* to the given *base* (*e* by default) | +------------------------------------------------------+--------------------------------------------------------------------+ | "log10(z)" | Return the base-10 logarithm of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "sqrt(z)" | Return the square root of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | **Trigonometric functions** | +------------------------------------------------------+--------------------------------------------------------------------+ | "acos(z)" | Return the arc cosine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "asin(z)" | Return the arc sine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "atan(z)" | Return the arc tangent of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "cos(z)" | Return the cosine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "sin(z)" | Return the sine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "tan(z)" | Return the tangent of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | **Hyperbolic functions** | +------------------------------------------------------+--------------------------------------------------------------------+ | "acosh(z)" | Return the inverse hyperbolic cosine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "asinh(z)" | Return the inverse hyperbolic sine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "atanh(z)" | Return the inverse hyperbolic tangent of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "cosh(z)" | Return the hyperbolic cosine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "sinh(z)" | Return the hyperbolic sine of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | "tanh(z)" | Return the hyperbolic tangent of *z* | +------------------------------------------------------+--------------------------------------------------------------------+ | **Classification functions** | +------------------------------------------------------+--------------------------------------------------------------------+ | "isfinite(z)" | Check if all components of *z* are finite | +------------------------------------------------------+--------------------------------------------------------------------+ | "isinf(z)" | Check if any component of *z* is infinite | +------------------------------------------------------+--------------------------------------------------------------------+ | "isnan(z)" | Check if any component of *z* is a NaN | +------------------------------------------------------+--------------------------------------------------------------------+ | "isclose(a, b, *, rel_tol, abs_tol)" | Check if the values *a* and *b* are close to each other | +------------------------------------------------------+--------------------------------------------------------------------+ | **Constants** | +------------------------------------------------------+--------------------------------------------------------------------+ | "pi" | *π* = 3.141592… | +------------------------------------------------------+--------------------------------------------------------------------+ | "e" | *e* = 2.718281… | +------------------------------------------------------+--------------------------------------------------------------------+ | "tau" | *τ* = 2*π* = 6.283185… | +------------------------------------------------------+--------------------------------------------------------------------+ | "inf" | Positive infinity | +------------------------------------------------------+--------------------------------------------------------------------+ | "infj" | Pure imaginary infinity | +------------------------------------------------------+--------------------------------------------------------------------+ | "nan" | “Not a number” (NaN) | +------------------------------------------------------+--------------------------------------------------------------------+ | "nanj" | Pure imaginary NaN | +------------------------------------------------------+--------------------------------------------------------------------+ Conversions to and from polar coordinates ========================================= A Python complex number "z" is stored internally using *rectangular* or *Cartesian* coordinates. It is completely determined by its *real part* "z.real" and its *imaginary part* "z.imag". *Polar coordinates* give an alternative way to represent a complex number. In polar coordinates, a complex number *z* is defined by the modulus *r* and the phase angle *phi*. The modulus *r* is the distance from *z* to the origin, while the phase *phi* is the counterclockwise angle, measured in radians, from the positive x-axis to the line segment that joins the origin to *z*. The following functions can be used to convert from the native rectangular coordinates to polar coordinates and back. cmath.phase(z) Return the phase of *z* (also known as the *argument* of *z*), as a float. "phase(z)" is equivalent to "math.atan2(z.imag, z.real)". The result lies in the range [-*π*, *π*], and the branch cut for this operation lies along the negative real axis. The sign of the result is the same as the sign of "z.imag", even when "z.imag" is zero: >>> phase(complex(-1.0, 0.0)) 3.141592653589793 >>> phase(complex(-1.0, -0.0)) -3.141592653589793 Note: The modulus (absolute value) of a complex number *z* can be computed using the built-in "abs()" function. There is no separate "cmath" module function for this operation. cmath.polar(z) Return the representation of *z* in polar coordinates. Returns a pair "(r, phi)" where *r* is the modulus of *z* and *phi* is the phase of *z*. "polar(z)" is equivalent to "(abs(z), phase(z))". cmath.rect(r, phi) Return the complex number *z* with polar coordinates *r* and *phi*. Equivalent to "complex(r * math.cos(phi), r * math.sin(phi))". Power and logarithmic functions =============================== cmath.exp(z) Return *e* raised to the power *z*, where *e* is the base of natural logarithms. cmath.log(z[, base]) Return the logarithm of *z* to the given *base*. If the *base* is not specified, returns the natural logarithm of *z*. There is one branch cut, from 0 along the negative real axis to -∞. cmath.log10(z) Return the base-10 logarithm of *z*. This has the same branch cut as "log()". cmath.sqrt(z) Return the square root of *z*. This has the same branch cut as "log()". Trigonometric functions ======================= cmath.acos(z) Return the arc cosine of *z*. There are two branch cuts: One extends right from 1 along the real axis to ∞. The other extends left from -1 along the real axis to -∞. cmath.asin(z) Return the arc sine of *z*. This has the same branch cuts as "acos()". cmath.atan(z) Return the arc tangent of *z*. There are two branch cuts: One extends from "1j" along the imaginary axis to "∞j". The other extends from "-1j" along the imaginary axis to "-∞j". cmath.cos(z) Return the cosine of *z*. cmath.sin(z) Return the sine of *z*. cmath.tan(z) Return the tangent of *z*. Hyperbolic functions ==================== cmath.acosh(z) Return the inverse hyperbolic cosine of *z*. There is one branch cut, extending left from 1 along the real axis to -∞. cmath.asinh(z) Return the inverse hyperbolic sine of *z*. There are two branch cuts: One extends from "1j" along the imaginary axis to "∞j". The other extends from "-1j" along the imaginary axis to "-∞j". cmath.atanh(z) Return the inverse hyperbolic tangent of *z*. There are two branch cuts: One extends from "1" along the real axis to "∞". The other extends from "-1" along the real axis to "-∞". cmath.cosh(z) Return the hyperbolic cosine of *z*. cmath.sinh(z) Return the hyperbolic sine of *z*. cmath.tanh(z) Return the hyperbolic tangent of *z*. Classification functions ======================== cmath.isfinite(z) Return "True" if both the real and imaginary parts of *z* are finite, and "False" otherwise. Added in version 3.2. cmath.isinf(z) Return "True" if either the real or the imaginary part of *z* is an infinity, and "False" otherwise. cmath.isnan(z) Return "True" if either the real or the imaginary part of *z* is a NaN, and "False" otherwise. cmath.isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) Return "True" if the values *a* and *b* are close to each other and "False" otherwise. Whether or not two values are considered close is determined according to given absolute and relative tolerances. If no errors occur, the result will be: "abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)". *rel_tol* is the relative tolerance – it is the maximum allowed difference between *a* and *b*, relative to the larger absolute value of *a* or *b*. For example, to set a tolerance of 5%, pass "rel_tol=0.05". The default tolerance is "1e-09", which assures that the two values are the same within about 9 decimal digits. *rel_tol* must be nonnegative and less than "1.0". *abs_tol* is the absolute tolerance; it defaults to "0.0" and it must be nonnegative. When comparing "x" to "0.0", "isclose(x, 0)" is computed as "abs(x) <= rel_tol * abs(x)", which is "False" for any "x" and rel_tol less than "1.0". So add an appropriate positive abs_tol argument to the call. The IEEE 754 special values of "NaN", "inf", and "-inf" will be handled according to IEEE rules. Specifically, "NaN" is not considered close to any other value, including "NaN". "inf" and "-inf" are only considered close to themselves. Added in version 3.5. See also: **PEP 485** – A function for testing approximate equality Constants ========= cmath.pi The mathematical constant *π*, as a float. cmath.e The mathematical constant *e*, as a float. cmath.tau The mathematical constant *τ*, as a float. Added in version 3.6. cmath.inf Floating-point positive infinity. Equivalent to "float('inf')". Added in version 3.6. cmath.infj Complex number with zero real part and positive infinity imaginary part. Equivalent to "complex(0.0, float('inf'))". Added in version 3.6. cmath.nan A floating-point “not a number” (NaN) value. Equivalent to "float('nan')". Added in version 3.6. cmath.nanj Complex number with zero real part and NaN imaginary part. Equivalent to "complex(0.0, float('nan'))". Added in version 3.6. Note that the selection of functions is similar, but not identical, to that in module "math". The reason for having two modules is that some users aren’t interested in complex numbers, and perhaps don’t even know what they are. They would rather have "math.sqrt(-1)" raise an exception than return a complex number. Also note that the functions defined in "cmath" always return a complex number, even if the answer can be expressed as a real number (in which case the complex number has an imaginary part of zero). A note on branch cuts: They are curves along which the given function fails to be continuous. They are a necessary feature of many complex functions. It is assumed that if you need to compute with complex functions, you will understand about branch cuts. Consult almost any (not too elementary) book on complex variables for enlightenment. For information of the proper choice of branch cuts for numerical purposes, a good reference should be the following: See also: Kahan, W: Branch cuts for complex elementary functions; or, Much ado about nothing’s sign bit. In Iserles, A., and Powell, M. (eds.), The state of the art in numerical analysis. Clarendon Press (1987) pp165–211. "cmd" — Support for line-oriented command interpreters ****************************************************** **Source code:** Lib/cmd.py ====================================================================== The "Cmd" class provides a simple framework for writing line-oriented command interpreters. These are often useful for test harnesses, administrative tools, and prototypes that will later be wrapped in a more sophisticated interface. class cmd.Cmd(completekey='tab', stdin=None, stdout=None) A "Cmd" instance or subclass instance is a line-oriented interpreter framework. There is no good reason to instantiate "Cmd" itself; rather, it’s useful as a superclass of an interpreter class you define yourself in order to inherit "Cmd"’s methods and encapsulate action methods. The optional argument *completekey* is the "readline" name of a completion key; it defaults to "Tab". If *completekey* is not "None" and "readline" is available, command completion is done automatically. The default, "'tab'", is treated specially, so that it refers to the "Tab" key on every "readline.backend". Specifically, if "readline.backend" is "editline", "Cmd" will use "'^I'" instead of "'tab'". Note that other values are not treated this way, and might only work with a specific backend. The optional arguments *stdin* and *stdout* specify the input and output file objects that the Cmd instance or subclass instance will use for input and output. If not specified, they will default to "sys.stdin" and "sys.stdout". If you want a given *stdin* to be used, make sure to set the instance’s "use_rawinput" attribute to "False", otherwise *stdin* will be ignored. Changed in version 3.13: "completekey='tab'" is replaced by "'^I'" for "editline". Cmd Objects =========== A "Cmd" instance has the following methods: Cmd.cmdloop(intro=None) Repeatedly issue a prompt, accept input, parse an initial prefix off the received input, and dispatch to action methods, passing them the remainder of the line as argument. The optional argument is a banner or intro string to be issued before the first prompt (this overrides the "intro" class attribute). If the "readline" module is loaded, input will automatically inherit **bash**-like history-list editing (e.g. "Control"-"P" scrolls back to the last command, "Control"-"N" forward to the next one, "Control"-"F" moves the cursor to the right non-destructively, "Control"-"B" moves the cursor to the left non-destructively, etc.). An end-of-file on input is passed back as the string "'EOF'". An interpreter instance will recognize a command name "foo" if and only if it has a method "do_foo()". As a special case, a line beginning with the character "'?'" is dispatched to the method "do_help()". As another special case, a line beginning with the character "'!'" is dispatched to the method "do_shell()" (if such a method is defined). This method will return when the "postcmd()" method returns a true value. The *stop* argument to "postcmd()" is the return value from the command’s corresponding "do_*()" method. If completion is enabled, completing commands will be done automatically, and completing of commands args is done by calling "complete_foo()" with arguments *text*, *line*, *begidx*, and *endidx*. *text* is the string prefix we are attempting to match: all returned matches must begin with it. *line* is the current input line with leading whitespace removed, *begidx* and *endidx* are the beginning and ending indexes of the prefix text, which could be used to provide different completion depending upon which position the argument is in. Cmd.do_help(arg) All subclasses of "Cmd" inherit a predefined "do_help()". This method, called with an argument "'bar'", invokes the corresponding method "help_bar()", and if that is not present, prints the docstring of "do_bar()", if available. With no argument, "do_help()" lists all available help topics (that is, all commands with corresponding "help_*()" methods or commands that have docstrings), and also lists any undocumented commands. Cmd.onecmd(str) Interpret the argument as though it had been typed in response to the prompt. This may be overridden, but should not normally need to be; see the "precmd()" and "postcmd()" methods for useful execution hooks. The return value is a flag indicating whether interpretation of commands by the interpreter should stop. If there is a "do_*()" method for the command *str*, the return value of that method is returned, otherwise the return value from the "default()" method is returned. Cmd.emptyline() Method called when an empty line is entered in response to the prompt. If this method is not overridden, it repeats the last nonempty command entered. Cmd.default(line) Method called on an input line when the command prefix is not recognized. If this method is not overridden, it prints an error message and returns. Cmd.completedefault(text, line, begidx, endidx) Method called to complete an input line when no command-specific "complete_*()" method is available. By default, it returns an empty list. Cmd.columnize(list, displaywidth=80) Method called to display a list of strings as a compact set of columns. Each column is only as wide as necessary. Columns are separated by two spaces for readability. Cmd.precmd(line) Hook method executed just before the command line *line* is interpreted, but after the input prompt is generated and issued. This method is a stub in "Cmd"; it exists to be overridden by subclasses. The return value is used as the command which will be executed by the "onecmd()" method; the "precmd()" implementation may re-write the command or simply return *line* unchanged. Cmd.postcmd(stop, line) Hook method executed just after a command dispatch is finished. This method is a stub in "Cmd"; it exists to be overridden by subclasses. *line* is the command line which was executed, and *stop* is a flag which indicates whether execution will be terminated after the call to "postcmd()"; this will be the return value of the "onecmd()" method. The return value of this method will be used as the new value for the internal flag which corresponds to *stop*; returning false will cause interpretation to continue. Cmd.preloop() Hook method executed once when "cmdloop()" is called. This method is a stub in "Cmd"; it exists to be overridden by subclasses. Cmd.postloop() Hook method executed once when "cmdloop()" is about to return. This method is a stub in "Cmd"; it exists to be overridden by subclasses. Instances of "Cmd" subclasses have some public instance variables: Cmd.prompt The prompt issued to solicit input. Cmd.identchars The string of characters accepted for the command prefix. Cmd.lastcmd The last nonempty command prefix seen. Cmd.cmdqueue A list of queued input lines. The cmdqueue list is checked in "cmdloop()" when new input is needed; if it is nonempty, its elements will be processed in order, as if entered at the prompt. Cmd.intro A string to issue as an intro or banner. May be overridden by giving the "cmdloop()" method an argument. Cmd.doc_header The header to issue if the help output has a section for documented commands. Cmd.misc_header The header to issue if the help output has a section for miscellaneous help topics (that is, there are "help_*()" methods without corresponding "do_*()" methods). Cmd.undoc_header The header to issue if the help output has a section for undocumented commands (that is, there are "do_*()" methods without corresponding "help_*()" methods). Cmd.ruler The character used to draw separator lines under the help-message headers. If empty, no ruler line is drawn. It defaults to "'='". Cmd.use_rawinput A flag, defaulting to true. If true, "cmdloop()" uses "input()" to display a prompt and read the next command; if false, "sys.stdout.write()" and "sys.stdin.readline()" are used. (This means that by importing "readline", on systems that support it, the interpreter will automatically support **Emacs**-like line editing and command-history keystrokes.) Cmd Example =========== The "cmd" module is mainly useful for building custom shells that let a user work with a program interactively. This section presents a simple example of how to build a shell around a few of the commands in the "turtle" module. Basic turtle commands such as "forward()" are added to a "Cmd" subclass with method named "do_forward()". The argument is converted to a number and dispatched to the turtle module. The docstring is used in the help utility provided by the shell. The example also includes a basic record and playback facility implemented with the "precmd()" method which is responsible for converting the input to lowercase and writing the commands to a file. The "do_playback()" method reads the file and adds the recorded commands to the "cmdqueue" for immediate playback: import cmd, sys from turtle import * class TurtleShell(cmd.Cmd): intro = 'Welcome to the turtle shell. Type help or ? to list commands.\n' prompt = '(turtle) ' file = None # ----- basic turtle commands ----- def do_forward(self, arg): 'Move the turtle forward by the specified distance: FORWARD 10' forward(*parse(arg)) def do_right(self, arg): 'Turn turtle right by given number of degrees: RIGHT 20' right(*parse(arg)) def do_left(self, arg): 'Turn turtle left by given number of degrees: LEFT 90' left(*parse(arg)) def do_goto(self, arg): 'Move turtle to an absolute position with changing orientation. GOTO 100 200' goto(*parse(arg)) def do_home(self, arg): 'Return turtle to the home position: HOME' home() def do_circle(self, arg): 'Draw circle with given radius an options extent and steps: CIRCLE 50' circle(*parse(arg)) def do_position(self, arg): 'Print the current turtle position: POSITION' print('Current position is %d %d\n' % position()) def do_heading(self, arg): 'Print the current turtle heading in degrees: HEADING' print('Current heading is %d\n' % (heading(),)) def do_color(self, arg): 'Set the color: COLOR BLUE' color(arg.lower()) def do_undo(self, arg): 'Undo (repeatedly) the last turtle action(s): UNDO' def do_reset(self, arg): 'Clear the screen and return turtle to center: RESET' reset() def do_bye(self, arg): 'Stop recording, close the turtle window, and exit: BYE' print('Thank you for using Turtle') self.close() bye() return True # ----- record and playback ----- def do_record(self, arg): 'Save future commands to filename: RECORD rose.cmd' self.file = open(arg, 'w') def do_playback(self, arg): 'Playback commands from a file: PLAYBACK rose.cmd' self.close() with open(arg) as f: self.cmdqueue.extend(f.read().splitlines()) def precmd(self, line): line = line.lower() if self.file and 'playback' not in line: print(line, file=self.file) return line def close(self): if self.file: self.file.close() self.file = None def parse(arg): 'Convert a series of zero or more numbers to an argument tuple' return tuple(map(int, arg.split())) if __name__ == '__main__': TurtleShell().cmdloop() Here is a sample session with the turtle shell showing the help functions, using blank lines to repeat commands, and the simple record and playback facility: Welcome to the turtle shell. Type help or ? to list commands. (turtle) ? Documented commands (type help ): ======================================== bye color goto home playback record right circle forward heading left position reset undo (turtle) help forward Move the turtle forward by the specified distance: FORWARD 10 (turtle) record spiral.cmd (turtle) position Current position is 0 0 (turtle) heading Current heading is 0 (turtle) reset (turtle) circle 20 (turtle) right 30 (turtle) circle 40 (turtle) right 30 (turtle) circle 60 (turtle) right 30 (turtle) circle 80 (turtle) right 30 (turtle) circle 100 (turtle) right 30 (turtle) circle 120 (turtle) right 30 (turtle) circle 120 (turtle) heading Current heading is 180 (turtle) forward 100 (turtle) (turtle) right 90 (turtle) forward 100 (turtle) (turtle) right 90 (turtle) forward 400 (turtle) right 90 (turtle) forward 500 (turtle) right 90 (turtle) forward 400 (turtle) right 90 (turtle) forward 300 (turtle) playback spiral.cmd Current position is 0 0 Current heading is 0 Current heading is 180 (turtle) bye Thank you for using Turtle Modules command-line interface (CLI) ************************************ The following modules have a command-line interface. * ast * asyncio * "base64" * calendar * "code" * compileall * "cProfile": see profile * difflib * dis * doctest * "encodings.rot_13" * "ensurepip" * "filecmp" * "fileinput" * "ftplib" * gzip * http.server * "idlelib" * inspect * json.tool * "mimetypes" * "pdb" * "pickle" * pickletools * platform * "poplib" * profile * "pstats" * py_compile * "pyclbr" * "pydoc" * "quopri" * random * "runpy" * site * sqlite3 * symtable * sysconfig * "tabnanny" * tarfile * "this" * timeit * tokenize * trace * "turtledemo" * unittest * uuid * "venv" * "webbrowser" * zipapp * zipfile See also the Python command-line interface. Command Line Interface Libraries ******************************** The modules described in this chapter assist with implementing command line and terminal interfaces for applications. Here’s an overview: * "argparse" — Parser for command-line options, arguments and subcommands * "optparse" — Parser for command line options * "getpass" — Portable password input * "fileinput" — Iterate over lines from multiple input streams * "curses" — Terminal handling for character-cell displays * "curses.textpad" — Text input widget for curses programs * "curses.ascii" — Utilities for ASCII characters * "curses.panel" — A panel stack extension for curses "code" — Interpreter base classes ********************************* **Source code:** Lib/code.py ====================================================================== The "code" module provides facilities to implement read-eval-print loops in Python. Two classes and convenience functions are included which can be used to build applications which provide an interactive interpreter prompt. class code.InteractiveInterpreter(locals=None) This class deals with parsing and interpreter state (the user’s namespace); it does not deal with input buffering or prompting or input file naming (the filename is always passed in explicitly). The optional *locals* argument specifies a mapping to use as the namespace in which code will be executed; it defaults to a newly created dictionary with key "'__name__'" set to "'__console__'" and key "'__doc__'" set to "None". Note that functions and classes objects created under an "InteractiveInterpreter" instance will belong to the namespace specified by *locals*. They are only pickleable if *locals* is the namespace of an existing module. class code.InteractiveConsole(locals=None, filename='', local_exit=False) Closely emulate the behavior of the interactive Python interpreter. This class builds on "InteractiveInterpreter" and adds prompting using the familiar "sys.ps1" and "sys.ps2", and input buffering. If *local_exit* is true, "exit()" and "quit()" in the console will not raise "SystemExit", but instead return to the calling code. Changed in version 3.13: Added *local_exit* parameter. code.interact(banner=None, readfunc=None, local=None, exitmsg=None, local_exit=False) Convenience function to run a read-eval-print loop. This creates a new instance of "InteractiveConsole" and sets *readfunc* to be used as the "InteractiveConsole.raw_input()" method, if provided. If *local* is provided, it is passed to the "InteractiveConsole" constructor for use as the default namespace for the interpreter loop. If *local_exit* is provided, it is passed to the "InteractiveConsole" constructor. The "interact()" method of the instance is then run with *banner* and *exitmsg* passed as the banner and exit message to use, if provided. The console object is discarded after use. Changed in version 3.6: Added *exitmsg* parameter. Changed in version 3.13: Added *local_exit* parameter. code.compile_command(source, filename='', symbol='single') This function is useful for programs that want to emulate Python’s interpreter main loop (a.k.a. the read-eval-print loop). The tricky part is to determine when the user has entered an incomplete command that can be completed by entering more text (as opposed to a complete command or a syntax error). This function *almost* always makes the same decision as the real interpreter main loop. *source* is the source string; *filename* is the optional filename from which source was read, defaulting to "''"; and *symbol* is the optional grammar start symbol, which should be "'single'" (the default), "'eval'" or "'exec'". Returns a code object (the same as "compile(source, filename, symbol)") if the command is complete and valid; "None" if the command is incomplete; raises "SyntaxError" if the command is complete and contains a syntax error, or raises "OverflowError" or "ValueError" if the command contains an invalid literal. Interactive Interpreter Objects =============================== InteractiveInterpreter.runsource(source, filename='', symbol='single') Compile and run some source in the interpreter. Arguments are the same as for "compile_command()"; the default for *filename* is "''", and for *symbol* is "'single'". One of several things can happen: * The input is incorrect; "compile_command()" raised an exception ("SyntaxError" or "OverflowError"). A syntax traceback will be printed by calling the "showsyntaxerror()" method. "runsource()" returns "False". * The input is incomplete, and more input is required; "compile_command()" returned "None". "runsource()" returns "True". * The input is complete; "compile_command()" returned a code object. The code is executed by calling the "runcode()" (which also handles run-time exceptions, except for "SystemExit"). "runsource()" returns "False". The return value can be used to decide whether to use "sys.ps1" or "sys.ps2" to prompt the next line. InteractiveInterpreter.runcode(code) Execute a code object. When an exception occurs, "showtraceback()" is called to display a traceback. All exceptions are caught except "SystemExit", which is allowed to propagate. A note about "KeyboardInterrupt": this exception may occur elsewhere in this code, and may not always be caught. The caller should be prepared to deal with it. InteractiveInterpreter.showsyntaxerror(filename=None) Display the syntax error that just occurred. This does not display a stack trace because there isn’t one for syntax errors. If *filename* is given, it is stuffed into the exception instead of the default filename provided by Python’s parser, because it always uses "''" when reading from a string. The output is written by the "write()" method. InteractiveInterpreter.showtraceback() Display the exception that just occurred. We remove the first stack item because it is within the interpreter object implementation. The output is written by the "write()" method. Changed in version 3.5: The full chained traceback is displayed instead of just the primary traceback. InteractiveInterpreter.write(data) Write a string to the standard error stream ("sys.stderr"). Derived classes should override this to provide the appropriate output handling as needed. Interactive Console Objects =========================== The "InteractiveConsole" class is a subclass of "InteractiveInterpreter", and so offers all the methods of the interpreter objects as well as the following additions. InteractiveConsole.interact(banner=None, exitmsg=None) Closely emulate the interactive Python console. The optional *banner* argument specify the banner to print before the first interaction; by default it prints a banner similar to the one printed by the standard Python interpreter, followed by the class name of the console object in parentheses (so as not to confuse this with the real interpreter – since it’s so close!). The optional *exitmsg* argument specifies an exit message printed when exiting. Pass the empty string to suppress the exit message. If *exitmsg* is not given or "None", a default message is printed. Changed in version 3.4: To suppress printing any banner, pass an empty string. Changed in version 3.6: Print an exit message when exiting. InteractiveConsole.push(line) Push a line of source text to the interpreter. The line should not have a trailing newline; it may have internal newlines. The line is appended to a buffer and the interpreter’s "runsource()" method is called with the concatenated contents of the buffer as source. If this indicates that the command was executed or invalid, the buffer is reset; otherwise, the command is incomplete, and the buffer is left as it was after the line was appended. The return value is "True" if more input is required, "False" if the line was dealt with in some way (this is the same as "runsource()"). InteractiveConsole.resetbuffer() Remove any unhandled source text from the input buffer. InteractiveConsole.raw_input(prompt='') Write a prompt and read a line. The returned line does not include the trailing newline. When the user enters the EOF key sequence, "EOFError" is raised. The base implementation reads from "sys.stdin"; a subclass may replace this with a different implementation. "codeop" — Compile Python code ****************************** **Source code:** Lib/codeop.py ====================================================================== The "codeop" module provides utilities upon which the Python read- eval-print loop can be emulated, as is done in the "code" module. As a result, you probably don’t want to use the module directly; if you want to include such a loop in your program you probably want to use the "code" module instead. There are two parts to this job: 1. Being able to tell if a line of input completes a Python statement: in short, telling whether to print ‘">>>"’ or ‘"..."’ next. 2. Remembering which future statements the user has entered, so subsequent input can be compiled with these in effect. The "codeop" module provides a way of doing each of these things, and a way of doing them both. To do just the former: codeop.compile_command(source, filename='', symbol='single') Tries to compile *source*, which should be a string of Python code and return a code object if *source* is valid Python code. In that case, the filename attribute of the code object will be *filename*, which defaults to "''". Returns "None" if *source* is *not* valid Python code, but is a prefix of valid Python code. If there is a problem with *source*, an exception will be raised. "SyntaxError" is raised if there is invalid Python syntax, and "OverflowError" or "ValueError" if there is an invalid literal. The *symbol* argument determines whether *source* is compiled as a statement ("'single'", the default), as a sequence of *statement* ("'exec'") or as an *expression* ("'eval'"). Any other value will cause "ValueError" to be raised. Note: It is possible (but not likely) that the parser stops parsing with a successful outcome before reaching the end of the source; in this case, trailing symbols may be ignored instead of causing an error. For example, a backslash followed by two newlines may be followed by arbitrary garbage. This will be fixed once the API for the parser is better. class codeop.Compile Instances of this class have "__call__()" methods identical in signature to the built-in function "compile()", but with the difference that if the instance compiles program text containing a "__future__" statement, the instance ‘remembers’ and compiles all subsequent program texts with the statement in force. class codeop.CommandCompiler Instances of this class have "__call__()" methods identical in signature to "compile_command()"; the difference is that if the instance compiles program text containing a "__future__" statement, the instance ‘remembers’ and compiles all subsequent program texts with the statement in force. "collections.abc" — Abstract Base Classes for Containers ******************************************************** Added in version 3.3: Formerly, this module was part of the "collections" module. **Source code:** Lib/_collections_abc.py ====================================================================== This module provides *abstract base classes* that can be used to test whether a class provides a particular interface; for example, whether it is *hashable* or whether it is a *mapping*. An "issubclass()" or "isinstance()" test for an interface works in one of three ways. 1. A newly written class can inherit directly from one of the abstract base classes. The class must supply the required abstract methods. The remaining mixin methods come from inheritance and can be overridden if desired. Other methods may be added as needed: class C(Sequence): # Direct inheritance def __init__(self): ... # Extra method not required by the ABC def __getitem__(self, index): ... # Required abstract method def __len__(self): ... # Required abstract method def count(self, value): ... # Optionally override a mixin method >>> issubclass(C, Sequence) True >>> isinstance(C(), Sequence) True 2. Existing classes and built-in classes can be registered as “virtual subclasses” of the ABCs. Those classes should define the full API including all of the abstract methods and all of the mixin methods. This lets users rely on "issubclass()" or "isinstance()" tests to determine whether the full interface is supported. The exception to this rule is for methods that are automatically inferred from the rest of the API: class D: # No inheritance def __init__(self): ... # Extra method not required by the ABC def __getitem__(self, index): ... # Abstract method def __len__(self): ... # Abstract method def count(self, value): ... # Mixin method def index(self, value): ... # Mixin method Sequence.register(D) # Register instead of inherit >>> issubclass(D, Sequence) True >>> isinstance(D(), Sequence) True In this example, class "D" does not need to define "__contains__", "__iter__", and "__reversed__" because the in-operator, the *iteration* logic, and the "reversed()" function automatically fall back to using "__getitem__" and "__len__". 3. Some simple interfaces are directly recognizable by the presence of the required methods (unless those methods have been set to "None"): class E: def __iter__(self): ... def __next__(self): ... >>> issubclass(E, Iterable) True >>> isinstance(E(), Iterable) True Complex interfaces do not support this last technique because an interface is more than just the presence of method names. Interfaces specify semantics and relationships between methods that cannot be inferred solely from the presence of specific method names. For example, knowing that a class supplies "__getitem__", "__len__", and "__iter__" is insufficient for distinguishing a "Sequence" from a "Mapping". Added in version 3.9: These abstract classes now support "[]". See Generic Alias Type and **PEP 585**. Collections Abstract Base Classes ================================= The collections module offers the following *ABCs*: +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | ABC | Inherits from | Abstract Methods | Mixin Methods | |================================|========================|=========================|======================================================| | "Container" [1] | | "__contains__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Hashable" [1] | | "__hash__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Iterable" [1] [2] | | "__iter__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Iterator" [1] | "Iterable" | "__next__" | "__iter__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Reversible" [1] | "Iterable" | "__reversed__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Generator" [1] | "Iterator" | "send", "throw" | "close", "__iter__", "__next__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Sized" [1] | | "__len__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Callable" [1] | | "__call__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Collection" [1] | "Sized", "Iterable", | "__contains__", | | | | "Container" | "__iter__", "__len__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Sequence" | "Reversible", | "__getitem__", | "__contains__", "__iter__", "__reversed__", "index", | | | "Collection" | "__len__" | and "count" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MutableSequence" | "Sequence" | "__getitem__", | Inherited "Sequence" methods and "append", "clear", | | | | "__setitem__", | "reverse", "extend", "pop", "remove", and "__iadd__" | | | | "__delitem__", | | | | | "__len__", "insert" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "ByteString" | "Sequence" | "__getitem__", | Inherited "Sequence" methods | | | | "__len__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Set" | "Collection" | "__contains__", | "__le__", "__lt__", "__eq__", "__ne__", "__gt__", | | | | "__iter__", "__len__" | "__ge__", "__and__", "__or__", "__sub__", | | | | | "__rsub__", "__xor__", "__rxor__" and "isdisjoint" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MutableSet" | "Set" | "__contains__", | Inherited "Set" methods and "clear", "pop", | | | | "__iter__", "__len__", | "remove", "__ior__", "__iand__", "__ixor__", and | | | | "add", "discard" | "__isub__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Mapping" | "Collection" | "__getitem__", | "__contains__", "keys", "items", "values", "get", | | | | "__iter__", "__len__" | "__eq__", and "__ne__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MutableMapping" | "Mapping" | "__getitem__", | Inherited "Mapping" methods and "pop", "popitem", | | | | "__setitem__", | "clear", "update", and "setdefault" | | | | "__delitem__", | | | | | "__iter__", "__len__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MappingView" | "Sized" | | "__init__", "__len__" and "__repr__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "ItemsView" | "MappingView", "Set" | | "__contains__", "__iter__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "KeysView" | "MappingView", "Set" | | "__contains__", "__iter__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "ValuesView" | "MappingView", | | "__contains__", "__iter__" | | | "Collection" | | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Awaitable" [1] | | "__await__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Coroutine" [1] | "Awaitable" | "send", "throw" | "close" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "AsyncIterable" [1] | | "__aiter__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "AsyncIterator" [1] | "AsyncIterable" | "__anext__" | "__aiter__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "AsyncGenerator" [1] | "AsyncIterator" | "asend", "athrow" | "aclose", "__aiter__", "__anext__" | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Buffer" [1] | | "__buffer__" | | +--------------------------------+------------------------+-------------------------+------------------------------------------------------+ -[ Footnotes ]- [1] These ABCs override "__subclasshook__()" to support testing an interface by verifying the required methods are present and have not been set to "None". This only works for simple interfaces. More complex interfaces require registration or direct subclassing. [2] Checking "isinstance(obj, Iterable)" detects classes that are registered as "Iterable" or that have an "__iter__()" method, but it does not detect classes that iterate with the "__getitem__()" method. The only reliable way to determine whether an object is *iterable* is to call "iter(obj)". Collections Abstract Base Classes – Detailed Descriptions ========================================================= class collections.abc.Container ABC for classes that provide the "__contains__()" method. class collections.abc.Hashable ABC for classes that provide the "__hash__()" method. class collections.abc.Sized ABC for classes that provide the "__len__()" method. class collections.abc.Callable ABC for classes that provide the "__call__()" method. See Annotating callable objects for details on how to use "Callable" in type annotations. class collections.abc.Iterable ABC for classes that provide the "__iter__()" method. Checking "isinstance(obj, Iterable)" detects classes that are registered as "Iterable" or that have an "__iter__()" method, but it does not detect classes that iterate with the "__getitem__()" method. The only reliable way to determine whether an object is *iterable* is to call "iter(obj)". class collections.abc.Collection ABC for sized iterable container classes. Added in version 3.6. class collections.abc.Iterator ABC for classes that provide the "__iter__()" and "__next__()" methods. See also the definition of *iterator*. class collections.abc.Reversible ABC for iterable classes that also provide the "__reversed__()" method. Added in version 3.6. class collections.abc.Generator ABC for *generator* classes that implement the protocol defined in **PEP 342** that extends *iterators* with the "send()", "throw()" and "close()" methods. See Annotating generators and coroutines for details on using "Generator" in type annotations. Added in version 3.5. class collections.abc.Sequence class collections.abc.MutableSequence class collections.abc.ByteString ABCs for read-only and mutable *sequences*. Implementation note: Some of the mixin methods, such as "__iter__()", "__reversed__()" and "index()", make repeated calls to the underlying "__getitem__()" method. Consequently, if "__getitem__()" is implemented with constant access speed, the mixin methods will have linear performance; however, if the underlying method is linear (as it would be with a linked list), the mixins will have quadratic performance and will likely need to be overridden. Changed in version 3.5: The index() method added support for *stop* and *start* arguments. Deprecated since version 3.12, will be removed in version 3.14: The "ByteString" ABC has been deprecated. For use in typing, prefer a union, like "bytes | bytearray", or "collections.abc.Buffer". For use as an ABC, prefer "Sequence" or "collections.abc.Buffer". class collections.abc.Set class collections.abc.MutableSet ABCs for read-only and mutable sets. class collections.abc.Mapping class collections.abc.MutableMapping ABCs for read-only and mutable *mappings*. class collections.abc.MappingView class collections.abc.ItemsView class collections.abc.KeysView class collections.abc.ValuesView ABCs for mapping, items, keys, and values *views*. class collections.abc.Awaitable ABC for *awaitable* objects, which can be used in "await" expressions. Custom implementations must provide the "__await__()" method. *Coroutine* objects and instances of the "Coroutine" ABC are all instances of this ABC. Note: In CPython, generator-based coroutines (*generators* decorated with "@types.coroutine") are *awaitables*, even though they do not have an "__await__()" method. Using "isinstance(gencoro, Awaitable)" for them will return "False". Use "inspect.isawaitable()" to detect them. Added in version 3.5. class collections.abc.Coroutine ABC for *coroutine* compatible classes. These implement the following methods, defined in Coroutine Objects: "send()", "throw()", and "close()". Custom implementations must also implement "__await__()". All "Coroutine" instances are also instances of "Awaitable". Note: In CPython, generator-based coroutines (*generators* decorated with "@types.coroutine") are *awaitables*, even though they do not have an "__await__()" method. Using "isinstance(gencoro, Coroutine)" for them will return "False". Use "inspect.isawaitable()" to detect them. See Annotating generators and coroutines for details on using "Coroutine" in type annotations. The variance and order of type parameters correspond to those of "Generator". Added in version 3.5. class collections.abc.AsyncIterable ABC for classes that provide an "__aiter__" method. See also the definition of *asynchronous iterable*. Added in version 3.5. class collections.abc.AsyncIterator ABC for classes that provide "__aiter__" and "__anext__" methods. See also the definition of *asynchronous iterator*. Added in version 3.5. class collections.abc.AsyncGenerator ABC for *asynchronous generator* classes that implement the protocol defined in **PEP 525** and **PEP 492**. See Annotating generators and coroutines for details on using "AsyncGenerator" in type annotations. Added in version 3.6. class collections.abc.Buffer ABC for classes that provide the "__buffer__()" method, implementing the buffer protocol. See **PEP 688**. Added in version 3.12. Examples and Recipes ==================== ABCs allow us to ask classes or instances if they provide particular functionality, for example: size = None if isinstance(myvar, collections.abc.Sized): size = len(myvar) Several of the ABCs are also useful as mixins that make it easier to develop classes supporting container APIs. For example, to write a class supporting the full "Set" API, it is only necessary to supply the three underlying abstract methods: "__contains__()", "__iter__()", and "__len__()". The ABC supplies the remaining methods such as "__and__()" and "isdisjoint()": class ListBasedSet(collections.abc.Set): ''' Alternate set implementation favoring space over speed and not requiring the set elements to be hashable. ''' def __init__(self, iterable): self.elements = lst = [] for value in iterable: if value not in lst: lst.append(value) def __iter__(self): return iter(self.elements) def __contains__(self, value): return value in self.elements def __len__(self): return len(self.elements) s1 = ListBasedSet('abcdef') s2 = ListBasedSet('defghi') overlap = s1 & s2 # The __and__() method is supported automatically Notes on using "Set" and "MutableSet" as a mixin: 1. Since some set operations create new sets, the default mixin methods need a way to create new instances from an *iterable*. The class constructor is assumed to have a signature in the form "ClassName(iterable)". That assumption is factored-out to an internal "classmethod" called "_from_iterable()" which calls "cls(iterable)" to produce a new set. If the "Set" mixin is being used in a class with a different constructor signature, you will need to override "_from_iterable()" with a classmethod or regular method that can construct new instances from an iterable argument. 2. To override the comparisons (presumably for speed, as the semantics are fixed), redefine "__le__()" and "__ge__()", then the other operations will automatically follow suit. 3. The "Set" mixin provides a "_hash()" method to compute a hash value for the set; however, "__hash__()" is not defined because not all sets are *hashable* or immutable. To add set hashability using mixins, inherit from both "Set()" and "Hashable()", then define "__hash__ = Set._hash". See also: * OrderedSet recipe for an example built on "MutableSet". * For more about ABCs, see the "abc" module and **PEP 3119**. "collections" — Container datatypes *********************************** **Source code:** Lib/collections/__init__.py ====================================================================== This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, "dict", "list", "set", and "tuple". +-----------------------+----------------------------------------------------------------------+ | "namedtuple()" | factory function for creating tuple subclasses with named fields | +-----------------------+----------------------------------------------------------------------+ | "deque" | list-like container with fast appends and pops on either end | +-----------------------+----------------------------------------------------------------------+ | "ChainMap" | dict-like class for creating a single view of multiple mappings | +-----------------------+----------------------------------------------------------------------+ | "Counter" | dict subclass for counting *hashable* objects | +-----------------------+----------------------------------------------------------------------+ | "OrderedDict" | dict subclass that remembers the order entries were added | +-----------------------+----------------------------------------------------------------------+ | "defaultdict" | dict subclass that calls a factory function to supply missing values | +-----------------------+----------------------------------------------------------------------+ | "UserDict" | wrapper around dictionary objects for easier dict subclassing | +-----------------------+----------------------------------------------------------------------+ | "UserList" | wrapper around list objects for easier list subclassing | +-----------------------+----------------------------------------------------------------------+ | "UserString" | wrapper around string objects for easier string subclassing | +-----------------------+----------------------------------------------------------------------+ "ChainMap" objects ================== Added in version 3.3. A "ChainMap" class is provided for quickly linking a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple "update()" calls. The class can be used to simulate nested scopes and is useful in templating. class collections.ChainMap(*maps) A "ChainMap" groups multiple dicts or other mappings together to create a single, updateable view. If no *maps* are specified, a single empty dictionary is provided so that a new chain always has at least one mapping. The underlying mappings are stored in a list. That list is public and can be accessed or updated using the *maps* attribute. There is no other state. Lookups search the underlying mappings successively until a key is found. In contrast, writes, updates, and deletions only operate on the first mapping. A "ChainMap" incorporates the underlying mappings by reference. So, if one of the underlying mappings gets updated, those changes will be reflected in "ChainMap". All of the usual dictionary methods are supported. In addition, there is a *maps* attribute, a method for creating new subcontexts, and a property for accessing all but the first mapping: maps A user updateable list of mappings. The list is ordered from first-searched to last-searched. It is the only stored state and can be modified to change which mappings are searched. The list should always contain at least one mapping. new_child(m=None, **kwargs) Returns a new "ChainMap" containing a new map followed by all of the maps in the current instance. If "m" is specified, it becomes the new map at the front of the list of mappings; if not specified, an empty dict is used, so that a call to "d.new_child()" is equivalent to: "ChainMap({}, *d.maps)". If any keyword arguments are specified, they update passed map or new empty dict. This method is used for creating subcontexts that can be updated without altering values in any of the parent mappings. Changed in version 3.4: The optional "m" parameter was added. Changed in version 3.10: Keyword arguments support was added. parents Property returning a new "ChainMap" containing all of the maps in the current instance except the first one. This is useful for skipping the first map in the search. Use cases are similar to those for the "nonlocal" keyword used in *nested scopes*. The use cases also parallel those for the built-in "super()" function. A reference to "d.parents" is equivalent to: "ChainMap(*d.maps[1:])". Note, the iteration order of a "ChainMap" is determined by scanning the mappings last to first: >>> baseline = {'music': 'bach', 'art': 'rembrandt'} >>> adjustments = {'art': 'van gogh', 'opera': 'carmen'} >>> list(ChainMap(adjustments, baseline)) ['music', 'art', 'opera'] This gives the same ordering as a series of "dict.update()" calls starting with the last mapping: >>> combined = baseline.copy() >>> combined.update(adjustments) >>> list(combined) ['music', 'art', 'opera'] Changed in version 3.9: Added support for "|" and "|=" operators, specified in **PEP 584**. See also: * The MultiContext class in the Enthought CodeTools package has options to support writing to any mapping in the chain. * Django’s Context class for templating is a read-only chain of mappings. It also features pushing and popping of contexts similar to the "new_child()" method and the "parents" property. * The Nested Contexts recipe has options to control whether writes and other mutations apply only to the first mapping or to any mapping in the chain. * A greatly simplified read-only version of Chainmap. "ChainMap" Examples and Recipes ------------------------------- This section shows various approaches to working with chained maps. Example of simulating Python’s internal lookup chain: import builtins pylookup = ChainMap(locals(), globals(), vars(builtins)) Example of letting user specified command-line arguments take precedence over environment variables which in turn take precedence over default values: import os, argparse defaults = {'color': 'red', 'user': 'guest'} parser = argparse.ArgumentParser() parser.add_argument('-u', '--user') parser.add_argument('-c', '--color') namespace = parser.parse_args() command_line_args = {k: v for k, v in vars(namespace).items() if v is not None} combined = ChainMap(command_line_args, os.environ, defaults) print(combined['color']) print(combined['user']) Example patterns for using the "ChainMap" class to simulate nested contexts: c = ChainMap() # Create root context d = c.new_child() # Create nested child context e = c.new_child() # Child of c, independent from d e.maps[0] # Current context dictionary -- like Python's locals() e.maps[-1] # Root context -- like Python's globals() e.parents # Enclosing context chain -- like Python's nonlocals d['x'] = 1 # Set value in current context d['x'] # Get first key in the chain of contexts del d['x'] # Delete from current context list(d) # All nested values k in d # Check all nested values len(d) # Number of nested values d.items() # All nested items dict(d) # Flatten into a regular dictionary The "ChainMap" class only makes updates (writes and deletions) to the first mapping in the chain while lookups will search the full chain. However, if deep writes and deletions are desired, it is easy to make a subclass that updates keys found deeper in the chain: class DeepChainMap(ChainMap): 'Variant of ChainMap that allows direct updates to inner scopes' def __setitem__(self, key, value): for mapping in self.maps: if key in mapping: mapping[key] = value return self.maps[0][key] = value def __delitem__(self, key): for mapping in self.maps: if key in mapping: del mapping[key] return raise KeyError(key) >>> d = DeepChainMap({'zebra': 'black'}, {'elephant': 'blue'}, {'lion': 'yellow'}) >>> d['lion'] = 'orange' # update an existing key two levels down >>> d['snake'] = 'red' # new keys get added to the topmost dict >>> del d['elephant'] # remove an existing key one level down >>> d # display result DeepChainMap({'zebra': 'black', 'snake': 'red'}, {}, {'lion': 'orange'}) "Counter" objects ================= A counter tool is provided to support convenient and rapid tallies. For example: >>> # Tally occurrences of words in a list >>> cnt = Counter() >>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']: ... cnt[word] += 1 ... >>> cnt Counter({'blue': 3, 'red': 2, 'green': 1}) >>> # Find the ten most common words in Hamlet >>> import re >>> words = re.findall(r'\w+', open('hamlet.txt').read().lower()) >>> Counter(words).most_common(10) [('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631), ('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)] class collections.Counter([iterable-or-mapping]) A "Counter" is a "dict" subclass for counting *hashable* objects. It is a collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The "Counter" class is similar to bags or multisets in other languages. Elements are counted from an *iterable* or initialized from another *mapping* (or counter): >>> c = Counter() # a new, empty counter >>> c = Counter('gallahad') # a new counter from an iterable >>> c = Counter({'red': 4, 'blue': 2}) # a new counter from a mapping >>> c = Counter(cats=4, dogs=8) # a new counter from keyword args Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a "KeyError": >>> c = Counter(['eggs', 'ham']) >>> c['bacon'] # count of a missing element is zero 0 Setting a count to zero does not remove an element from a counter. Use "del" to remove it entirely: >>> c['sausage'] = 0 # counter entry with a zero count >>> del c['sausage'] # del actually removes the entry Added in version 3.1. Changed in version 3.7: As a "dict" subclass, "Counter" inherited the capability to remember insertion order. Math operations on *Counter* objects also preserve order. Results are ordered according to when an element is first encountered in the left operand and then by the order encountered in the right operand. Counter objects support additional methods beyond those available for all dictionaries: elements() Return an iterator over elements repeating each as many times as its count. Elements are returned in the order first encountered. If an element’s count is less than one, "elements()" will ignore it. >>> c = Counter(a=4, b=2, c=0, d=-2) >>> sorted(c.elements()) ['a', 'a', 'a', 'a', 'b', 'b'] most_common([n]) Return a list of the *n* most common elements and their counts from the most common to the least. If *n* is omitted or "None", "most_common()" returns *all* elements in the counter. Elements with equal counts are ordered in the order first encountered: >>> Counter('abracadabra').most_common(3) [('a', 5), ('b', 2), ('r', 2)] subtract([iterable-or-mapping]) Elements are subtracted from an *iterable* or from another *mapping* (or counter). Like "dict.update()" but subtracts counts instead of replacing them. Both inputs and outputs may be zero or negative. >>> c = Counter(a=4, b=2, c=0, d=-2) >>> d = Counter(a=1, b=2, c=3, d=4) >>> c.subtract(d) >>> c Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6}) Added in version 3.2. total() Compute the sum of the counts. >>> c = Counter(a=10, b=5, c=0) >>> c.total() 15 Added in version 3.10. The usual dictionary methods are available for "Counter" objects except for two which work differently for counters. fromkeys(iterable) This class method is not implemented for "Counter" objects. update([iterable-or-mapping]) Elements are counted from an *iterable* or added-in from another *mapping* (or counter). Like "dict.update()" but adds counts instead of replacing them. Also, the *iterable* is expected to be a sequence of elements, not a sequence of "(key, value)" pairs. Counters support rich comparison operators for equality, subset, and superset relationships: "==", "!=", "<", "<=", ">", ">=". All of those tests treat missing elements as having zero counts so that "Counter(a=1) == Counter(a=1, b=0)" returns true. Changed in version 3.10: Rich comparison operations were added. Changed in version 3.10: In equality tests, missing elements are treated as having zero counts. Formerly, "Counter(a=3)" and "Counter(a=3, b=0)" were considered distinct. Common patterns for working with "Counter" objects: c.total() # total of all counts c.clear() # reset all counts list(c) # list unique elements set(c) # convert to a set dict(c) # convert to a regular dictionary c.items() # access the (elem, cnt) pairs Counter(dict(list_of_pairs)) # convert from a list of (elem, cnt) pairs c.most_common()[:-n-1:-1] # n least common elements +c # remove zero and negative counts Several mathematical operations are provided for combining "Counter" objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Equality and inclusion compare corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less. >>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3}) >>> c - d # subtract (keeping only positive counts) Counter({'a': 2}) >>> c & d # intersection: min(c[x], d[x]) Counter({'a': 1, 'b': 1}) >>> c | d # union: max(c[x], d[x]) Counter({'a': 3, 'b': 2}) >>> c == d # equality: c[x] == d[x] False >>> c <= d # inclusion: c[x] <= d[x] False Unary addition and subtraction are shortcuts for adding an empty counter or subtracting from an empty counter. >>> c = Counter(a=2, b=-4) >>> +c Counter({'a': 2}) >>> -c Counter({'b': 4}) Added in version 3.3: Added support for unary plus, unary minus, and in-place multiset operations. Note: Counters were primarily designed to work with positive integers to represent running counts; however, care was taken to not unnecessarily preclude use cases needing other types or negative values. To help with those use cases, this section documents the minimum range and type restrictions. * The "Counter" class itself is a dictionary subclass with no restrictions on its keys and values. The values are intended to be numbers representing counts, but you *could* store anything in the value field. * The "most_common()" method requires only that the values be orderable. * For in-place operations such as "c[key] += 1", the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are supported. The same is also true for "update()" and "subtract()" which allow negative and zero values for both inputs and outputs. * The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison. * The "elements()" method requires integer counts. It ignores zero and negative counts. See also: * Bag class in Smalltalk. * Wikipedia entry for Multisets. * C++ multisets tutorial with examples. * For mathematical operations on multisets and their use cases, see *Knuth, Donald. The Art of Computer Programming Volume II, Section 4.6.3, Exercise 19*. * To enumerate all distinct multisets of a given size over a given set of elements, see "itertools.combinations_with_replacement()": map(Counter, combinations_with_replacement('ABC', 2)) # --> AA AB AC BB BC CC "deque" objects =============== class collections.deque([iterable[, maxlen]]) Returns a new deque object initialized left-to-right (using "append()") with data from *iterable*. If *iterable* is not specified, the new deque is empty. Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same *O*(1) performance in either direction. Though "list" objects support similar operations, they are optimized for fast fixed-length operations and incur *O*(*n*) memory movement costs for "pop(0)" and "insert(0, v)" operations which change both the size and position of the underlying data representation. If *maxlen* is not specified or is "None", deques may grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end. Bounded length deques provide functionality similar to the "tail" filter in Unix. They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest. Deque objects support the following methods: append(x) Add *x* to the right side of the deque. appendleft(x) Add *x* to the left side of the deque. clear() Remove all elements from the deque leaving it with length 0. copy() Create a shallow copy of the deque. Added in version 3.5. count(x) Count the number of deque elements equal to *x*. Added in version 3.2. extend(iterable) Extend the right side of the deque by appending elements from the iterable argument. extendleft(iterable) Extend the left side of the deque by appending elements from *iterable*. Note, the series of left appends results in reversing the order of elements in the iterable argument. index(x[, start[, stop]]) Return the position of *x* in the deque (at or after index *start* and before index *stop*). Returns the first match or raises "ValueError" if not found. Added in version 3.5. insert(i, x) Insert *x* into the deque at position *i*. If the insertion would cause a bounded deque to grow beyond *maxlen*, an "IndexError" is raised. Added in version 3.5. pop() Remove and return an element from the right side of the deque. If no elements are present, raises an "IndexError". popleft() Remove and return an element from the left side of the deque. If no elements are present, raises an "IndexError". remove(value) Remove the first occurrence of *value*. If not found, raises a "ValueError". reverse() Reverse the elements of the deque in-place and then return "None". Added in version 3.2. rotate(n=1) Rotate the deque *n* steps to the right. If *n* is negative, rotate to the left. When the deque is not empty, rotating one step to the right is equivalent to "d.appendleft(d.pop())", and rotating one step to the left is equivalent to "d.append(d.popleft())". Deque objects also provide one read-only attribute: maxlen Maximum size of a deque or "None" if unbounded. Added in version 3.1. In addition to the above, deques support iteration, pickling, "len(d)", "reversed(d)", "copy.copy(d)", "copy.deepcopy(d)", membership testing with the "in" operator, and subscript references such as "d[0]" to access the first element. Indexed access is *O*(1) at both ends but slows to *O*(*n*) in the middle. For fast random access, use lists instead. Starting in version 3.5, deques support "__add__()", "__mul__()", and "__imul__()". Example: >>> from collections import deque >>> d = deque('ghi') # make a new deque with three items >>> for elem in d: # iterate over the deque's elements ... print(elem.upper()) G H I >>> d.append('j') # add a new entry to the right side >>> d.appendleft('f') # add a new entry to the left side >>> d # show the representation of the deque deque(['f', 'g', 'h', 'i', 'j']) >>> d.pop() # return and remove the rightmost item 'j' >>> d.popleft() # return and remove the leftmost item 'f' >>> list(d) # list the contents of the deque ['g', 'h', 'i'] >>> d[0] # peek at leftmost item 'g' >>> d[-1] # peek at rightmost item 'i' >>> list(reversed(d)) # list the contents of a deque in reverse ['i', 'h', 'g'] >>> 'h' in d # search the deque True >>> d.extend('jkl') # add multiple elements at once >>> d deque(['g', 'h', 'i', 'j', 'k', 'l']) >>> d.rotate(1) # right rotation >>> d deque(['l', 'g', 'h', 'i', 'j', 'k']) >>> d.rotate(-1) # left rotation >>> d deque(['g', 'h', 'i', 'j', 'k', 'l']) >>> deque(reversed(d)) # make a new deque in reverse order deque(['l', 'k', 'j', 'i', 'h', 'g']) >>> d.clear() # empty the deque >>> d.pop() # cannot pop from an empty deque Traceback (most recent call last): File "", line 1, in -toplevel- d.pop() IndexError: pop from an empty deque >>> d.extendleft('abc') # extendleft() reverses the input order >>> d deque(['c', 'b', 'a']) "deque" Recipes --------------- This section shows various approaches to working with deques. Bounded length deques provide functionality similar to the "tail" filter in Unix: def tail(filename, n=10): 'Return the last n lines of a file' with open(filename) as f: return deque(f, n) Another approach to using deques is to maintain a sequence of recently added elements by appending to the right and popping to the left: def moving_average(iterable, n=3): # moving_average([40, 30, 50, 46, 39, 44]) --> 40.0 42.0 45.0 43.0 # https://en.wikipedia.org/wiki/Moving_average it = iter(iterable) d = deque(itertools.islice(it, n-1)) d.appendleft(0) s = sum(d) for elem in it: s += elem - d.popleft() d.append(elem) yield s / n A round-robin scheduler can be implemented with input iterators stored in a "deque". Values are yielded from the active iterator in position zero. If that iterator is exhausted, it can be removed with "popleft()"; otherwise, it can be cycled back to the end with the "rotate()" method: def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" iterators = deque(map(iter, iterables)) while iterators: try: while True: yield next(iterators[0]) iterators.rotate(-1) except StopIteration: # Remove an exhausted iterator. iterators.popleft() The "rotate()" method provides a way to implement "deque" slicing and deletion. For example, a pure Python implementation of "del d[n]" relies on the "rotate()" method to position elements to be popped: def delete_nth(d, n): d.rotate(-n) d.popleft() d.rotate(n) To implement "deque" slicing, use a similar approach applying "rotate()" to bring a target element to the left side of the deque. Remove old entries with "popleft()", add new entries with "extend()", and then reverse the rotation. With minor variations on that approach, it is easy to implement Forth style stack manipulations such as "dup", "drop", "swap", "over", "pick", "rot", and "roll". "defaultdict" objects ===================== class collections.defaultdict(default_factory=None, /[, ...]) Return a new dictionary-like object. "defaultdict" is a subclass of the built-in "dict" class. It overrides one method and adds one writable instance variable. The remaining functionality is the same as for the "dict" class and is not documented here. The first argument provides the initial value for the "default_factory" attribute; it defaults to "None". All remaining arguments are treated the same as if they were passed to the "dict" constructor, including keyword arguments. "defaultdict" objects support the following method in addition to the standard "dict" operations: __missing__(key) If the "default_factory" attribute is "None", this raises a "KeyError" exception with the *key* as argument. If "default_factory" is not "None", it is called without arguments to provide a default value for the given *key*, this value is inserted in the dictionary for the *key*, and returned. If calling "default_factory" raises an exception this exception is propagated unchanged. This method is called by the "__getitem__()" method of the "dict" class when the requested key is not found; whatever it returns or raises is then returned or raised by "__getitem__()". Note that "__missing__()" is *not* called for any operations besides "__getitem__()". This means that "get()" will, like normal dictionaries, return "None" as a default rather than using "default_factory". "defaultdict" objects support the following instance variable: default_factory This attribute is used by the "__missing__()" method; it is initialized from the first argument to the constructor, if present, or to "None", if absent. Changed in version 3.9: Added merge ("|") and update ("|=") operators, specified in **PEP 584**. "defaultdict" Examples ---------------------- Using "list" as the "default_factory", it is easy to group a sequence of key-value pairs into a dictionary of lists: >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] >>> d = defaultdict(list) >>> for k, v in s: ... d[k].append(v) ... >>> sorted(d.items()) [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the "default_factory" function which returns an empty "list". The "list.append()" operation then attaches the value to the new list. When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the "list.append()" operation adds another value to the list. This technique is simpler and faster than an equivalent technique using "dict.setdefault()": >>> d = {} >>> for k, v in s: ... d.setdefault(k, []).append(v) ... >>> sorted(d.items()) [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] Setting the "default_factory" to "int" makes the "defaultdict" useful for counting (like a bag or multiset in other languages): >>> s = 'mississippi' >>> d = defaultdict(int) >>> for k in s: ... d[k] += 1 ... >>> sorted(d.items()) [('i', 4), ('m', 1), ('p', 2), ('s', 4)] When a letter is first encountered, it is missing from the mapping, so the "default_factory" function calls "int()" to supply a default count of zero. The increment operation then builds up the count for each letter. The function "int()" which always returns zero is just a special case of constant functions. A faster and more flexible way to create constant functions is to use a lambda function which can supply any constant value (not just zero): >>> def constant_factory(value): ... return lambda: value ... >>> d = defaultdict(constant_factory('')) >>> d.update(name='John', action='ran') >>> '%(name)s %(action)s to %(object)s' % d 'John ran to ' Setting the "default_factory" to "set" makes the "defaultdict" useful for building a dictionary of sets: >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)] >>> d = defaultdict(set) >>> for k, v in s: ... d[k].add(v) ... >>> sorted(d.items()) [('blue', {2, 4}), ('red', {1, 3})] "namedtuple()" Factory Function for Tuples with Named Fields ============================================================ Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index. collections.namedtuple(typename, field_names, *, rename=False, defaults=None, module=None) Returns a new tuple subclass named *typename*. The new subclass is used to create tuple-like objects that have fields accessible by attribute lookup as well as being indexable and iterable. Instances of the subclass also have a helpful docstring (with *typename* and *field_names*) and a helpful "__repr__()" method which lists the tuple contents in a "name=value" format. The *field_names* are a sequence of strings such as "['x', 'y']". Alternatively, *field_names* can be a single string with each fieldname separated by whitespace and/or commas, for example "'x y'" or "'x, y'". Any valid Python identifier may be used for a fieldname except for names starting with an underscore. Valid identifiers consist of letters, digits, and underscores but do not start with a digit or underscore and cannot be a "keyword" such as *class*, *for*, *return*, *global*, *pass*, or *raise*. If *rename* is true, invalid fieldnames are automatically replaced with positional names. For example, "['abc', 'def', 'ghi', 'abc']" is converted to "['abc', '_1', 'ghi', '_3']", eliminating the keyword "def" and the duplicate fieldname "abc". *defaults* can be "None" or an *iterable* of default values. Since fields with a default value must come after any fields without a default, the *defaults* are applied to the rightmost parameters. For example, if the fieldnames are "['x', 'y', 'z']" and the defaults are "(1, 2)", then "x" will be a required argument, "y" will default to "1", and "z" will default to "2". If *module* is defined, the "__module__" attribute of the named tuple is set to that value. Named tuple instances do not have per-instance dictionaries, so they are lightweight and require no more memory than regular tuples. To support pickling, the named tuple class should be assigned to a variable that matches *typename*. Changed in version 3.1: Added support for *rename*. Changed in version 3.6: The *verbose* and *rename* parameters became keyword-only arguments. Changed in version 3.6: Added the *module* parameter. Changed in version 3.7: Removed the *verbose* parameter and the "_source" attribute. Changed in version 3.7: Added the *defaults* parameter and the "_field_defaults" attribute. >>> # Basic example >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(11, y=22) # instantiate with positional or keyword arguments >>> p[0] + p[1] # indexable like the plain tuple (11, 22) 33 >>> x, y = p # unpack like a regular tuple >>> x, y (11, 22) >>> p.x + p.y # fields also accessible by name 33 >>> p # readable __repr__ with a name=value style Point(x=11, y=22) Named tuples are especially useful for assigning field names to result tuples returned by the "csv" or "sqlite3" modules: EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade') import csv for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))): print(emp.name, emp.title) import sqlite3 conn = sqlite3.connect('/companydata') cursor = conn.cursor() cursor.execute('SELECT name, age, title, department, paygrade FROM employees') for emp in map(EmployeeRecord._make, cursor.fetchall()): print(emp.name, emp.title) In addition to the methods inherited from tuples, named tuples support three additional methods and two attributes. To prevent conflicts with field names, the method and attribute names start with an underscore. classmethod somenamedtuple._make(iterable) Class method that makes a new instance from an existing sequence or iterable. >>> t = [11, 22] >>> Point._make(t) Point(x=11, y=22) somenamedtuple._asdict() Return a new "dict" which maps field names to their corresponding values: >>> p = Point(x=11, y=22) >>> p._asdict() {'x': 11, 'y': 22} Changed in version 3.1: Returns an "OrderedDict" instead of a regular "dict". Changed in version 3.8: Returns a regular "dict" instead of an "OrderedDict". As of Python 3.7, regular dicts are guaranteed to be ordered. If the extra features of "OrderedDict" are required, the suggested remediation is to cast the result to the desired type: "OrderedDict(nt._asdict())". somenamedtuple._replace(**kwargs) Return a new instance of the named tuple replacing specified fields with new values: >>> p = Point(x=11, y=22) >>> p._replace(x=33) Point(x=33, y=22) >>> for partnum, record in inventory.items(): ... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now()) Named tuples are also supported by generic function "copy.replace()". Changed in version 3.13: Raise "TypeError" instead of "ValueError" for invalid keyword arguments. somenamedtuple._fields Tuple of strings listing the field names. Useful for introspection and for creating new named tuple types from existing named tuples. >>> p._fields # view the field names ('x', 'y') >>> Color = namedtuple('Color', 'red green blue') >>> Pixel = namedtuple('Pixel', Point._fields + Color._fields) >>> Pixel(11, 22, 128, 255, 0) Pixel(x=11, y=22, red=128, green=255, blue=0) somenamedtuple._field_defaults Dictionary mapping field names to default values. >>> Account = namedtuple('Account', ['type', 'balance'], defaults=[0]) >>> Account._field_defaults {'balance': 0} >>> Account('premium') Account(type='premium', balance=0) To retrieve a field whose name is stored in a string, use the "getattr()" function: >>> getattr(p, 'x') 11 To convert a dictionary to a named tuple, use the double-star-operator (as described in Unpacking Argument Lists): >>> d = {'x': 11, 'y': 22} >>> Point(**d) Point(x=11, y=22) Since a named tuple is a regular Python class, it is easy to add or change functionality with a subclass. Here is how to add a calculated field and a fixed-width print format: >>> class Point(namedtuple('Point', ['x', 'y'])): ... __slots__ = () ... @property ... def hypot(self): ... return (self.x ** 2 + self.y ** 2) ** 0.5 ... def __str__(self): ... return 'Point: x=%6.3f y=%6.3f hypot=%6.3f' % (self.x, self.y, self.hypot) >>> for p in Point(3, 4), Point(14, 5/7): ... print(p) Point: x= 3.000 y= 4.000 hypot= 5.000 Point: x=14.000 y= 0.714 hypot=14.018 The subclass shown above sets "__slots__" to an empty tuple. This helps keep memory requirements low by preventing the creation of instance dictionaries. Subclassing is not useful for adding new, stored fields. Instead, simply create a new named tuple type from the "_fields" attribute: >>> Point3D = namedtuple('Point3D', Point._fields + ('z',)) Docstrings can be customized by making direct assignments to the "__doc__" fields: >>> Book = namedtuple('Book', ['id', 'title', 'authors']) >>> Book.__doc__ += ': Hardcover book in active collection' >>> Book.id.__doc__ = '13-digit ISBN' >>> Book.title.__doc__ = 'Title of first printing' >>> Book.authors.__doc__ = 'List of authors sorted by last name' Changed in version 3.5: Property docstrings became writeable. See also: * See "typing.NamedTuple" for a way to add type hints for named tuples. It also provides an elegant notation using the "class" keyword: class Component(NamedTuple): part_number: int weight: float description: Optional[str] = None * See "types.SimpleNamespace()" for a mutable namespace based on an underlying dictionary instead of a tuple. * The "dataclasses" module provides a decorator and functions for automatically adding generated special methods to user-defined classes. "OrderedDict" objects ===================== Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations. They have become less important now that the built-in "dict" class gained the ability to remember insertion order (this new behavior became guaranteed in Python 3.7). Some differences from "dict" still remain: * The regular "dict" was designed to be very good at mapping operations. Tracking insertion order was secondary. * The "OrderedDict" was designed to be good at reordering operations. Space efficiency, iteration speed, and the performance of update operations were secondary. * The "OrderedDict" algorithm can handle frequent reordering operations better than "dict". As shown in the recipes below, this makes it suitable for implementing various kinds of LRU caches. * The equality operation for "OrderedDict" checks for matching order. A regular "dict" can emulate the order sensitive equality test with "p == q and all(k1 == k2 for k1, k2 in zip(p, q))". * The "popitem()" method of "OrderedDict" has a different signature. It accepts an optional argument to specify which item is popped. A regular "dict" can emulate OrderedDict’s "od.popitem(last=True)" with "d.popitem()" which is guaranteed to pop the rightmost (last) item. A regular "dict" can emulate OrderedDict’s "od.popitem(last=False)" with "(k := next(iter(d)), d.pop(k))" which will return and remove the leftmost (first) item if it exists. * "OrderedDict" has a "move_to_end()" method to efficiently reposition an element to an endpoint. A regular "dict" can emulate OrderedDict’s "od.move_to_end(k, last=True)" with "d[k] = d.pop(k)" which will move the key and its associated value to the rightmost (last) position. A regular "dict" does not have an efficient equivalent for OrderedDict’s "od.move_to_end(k, last=False)" which moves the key and its associated value to the leftmost (first) position. * Until Python 3.8, "dict" lacked a "__reversed__()" method. class collections.OrderedDict([items]) Return an instance of a "dict" subclass that has methods specialized for rearranging dictionary order. Added in version 3.1. popitem(last=True) The "popitem()" method for ordered dictionaries returns and removes a (key, value) pair. The pairs are returned in LIFO (last-in, first-out) order if *last* is true or FIFO (first-in, first-out) order if false. move_to_end(key, last=True) Move an existing *key* to either end of an ordered dictionary. The item is moved to the right end if *last* is true (the default) or to the beginning if *last* is false. Raises "KeyError" if the *key* does not exist: >>> d = OrderedDict.fromkeys('abcde') >>> d.move_to_end('b') >>> ''.join(d) 'acdeb' >>> d.move_to_end('b', last=False) >>> ''.join(d) 'bacde' Added in version 3.2. In addition to the usual mapping methods, ordered dictionaries also support reverse iteration using "reversed()". Equality tests between "OrderedDict" objects are order-sensitive and are roughly equivalent to "list(od1.items())==list(od2.items())". Equality tests between "OrderedDict" objects and other "Mapping" objects are order-insensitive like regular dictionaries. This allows "OrderedDict" objects to be substituted anywhere a regular dictionary is used. Changed in version 3.5: The items, keys, and values *views* of "OrderedDict" now support reverse iteration using "reversed()". Changed in version 3.6: With the acceptance of **PEP 468**, order is retained for keyword arguments passed to the "OrderedDict" constructor and its "update()" method. Changed in version 3.9: Added merge ("|") and update ("|=") operators, specified in **PEP 584**. "OrderedDict" Examples and Recipes ---------------------------------- It is straightforward to create an ordered dictionary variant that remembers the order the keys were *last* inserted. If a new entry overwrites an existing entry, the original insertion position is changed and moved to the end: class LastUpdatedOrderedDict(OrderedDict): 'Store items in the order the keys were last added' def __setitem__(self, key, value): super().__setitem__(key, value) self.move_to_end(key) An "OrderedDict" would also be useful for implementing variants of "functools.lru_cache()": from collections import OrderedDict from time import time class TimeBoundedLRU: "LRU Cache that invalidates and refreshes old entries." def __init__(self, func, maxsize=128, maxage=30): self.cache = OrderedDict() # { args : (timestamp, result)} self.func = func self.maxsize = maxsize self.maxage = maxage def __call__(self, *args): if args in self.cache: self.cache.move_to_end(args) timestamp, result = self.cache[args] if time() - timestamp <= self.maxage: return result result = self.func(*args) self.cache[args] = time(), result if len(self.cache) > self.maxsize: self.cache.popitem(last=False) return result class MultiHitLRUCache: """ LRU cache that defers caching a result until it has been requested multiple times. To avoid flushing the LRU cache with one-time requests, we don't cache until a request has been made more than once. """ def __init__(self, func, maxsize=128, maxrequests=4096, cache_after=1): self.requests = OrderedDict() # { uncached_key : request_count } self.cache = OrderedDict() # { cached_key : function_result } self.func = func self.maxrequests = maxrequests # max number of uncached requests self.maxsize = maxsize # max number of stored return values self.cache_after = cache_after def __call__(self, *args): if args in self.cache: self.cache.move_to_end(args) return self.cache[args] result = self.func(*args) self.requests[args] = self.requests.get(args, 0) + 1 if self.requests[args] <= self.cache_after: self.requests.move_to_end(args) if len(self.requests) > self.maxrequests: self.requests.popitem(last=False) else: self.requests.pop(args, None) self.cache[args] = result if len(self.cache) > self.maxsize: self.cache.popitem(last=False) return result "UserDict" objects ================== The class, "UserDict" acts as a wrapper around dictionary objects. The need for this class has been partially supplanted by the ability to subclass directly from "dict"; however, this class can be easier to work with because the underlying dictionary is accessible as an attribute. class collections.UserDict([initialdata]) Class that simulates a dictionary. The instance’s contents are kept in a regular dictionary, which is accessible via the "data" attribute of "UserDict" instances. If *initialdata* is provided, "data" is initialized with its contents; note that a reference to *initialdata* will not be kept, allowing it to be used for other purposes. In addition to supporting the methods and operations of mappings, "UserDict" instances provide the following attribute: data A real dictionary used to store the contents of the "UserDict" class. "UserList" objects ================== This class acts as a wrapper around list objects. It is a useful base class for your own list-like classes which can inherit from them and override existing methods or add new ones. In this way, one can add new behaviors to lists. The need for this class has been partially supplanted by the ability to subclass directly from "list"; however, this class can be easier to work with because the underlying list is accessible as an attribute. class collections.UserList([list]) Class that simulates a list. The instance’s contents are kept in a regular list, which is accessible via the "data" attribute of "UserList" instances. The instance’s contents are initially set to a copy of *list*, defaulting to the empty list "[]". *list* can be any iterable, for example a real Python list or a "UserList" object. In addition to supporting the methods and operations of mutable sequences, "UserList" instances provide the following attribute: data A real "list" object used to store the contents of the "UserList" class. **Subclassing requirements:** Subclasses of "UserList" are expected to offer a constructor which can be called with either no arguments or one argument. List operations which return a new sequence attempt to create an instance of the actual implementation class. To do so, it assumes that the constructor can be called with a single parameter, which is a sequence object used as a data source. If a derived class does not wish to comply with this requirement, all of the special methods supported by this class will need to be overridden; please consult the sources for information about the methods which need to be provided in that case. "UserString" objects ==================== The class, "UserString" acts as a wrapper around string objects. The need for this class has been partially supplanted by the ability to subclass directly from "str"; however, this class can be easier to work with because the underlying string is accessible as an attribute. class collections.UserString(seq) Class that simulates a string object. The instance’s content is kept in a regular string object, which is accessible via the "data" attribute of "UserString" instances. The instance’s contents are initially set to a copy of *seq*. The *seq* argument can be any object which can be converted into a string using the built-in "str()" function. In addition to supporting the methods and operations of strings, "UserString" instances provide the following attribute: data A real "str" object used to store the contents of the "UserString" class. Changed in version 3.5: New methods "__getnewargs__", "__rmod__", "casefold", "format_map", "isprintable", and "maketrans". "colorsys" — Conversions between color systems ********************************************** **Source code:** Lib/colorsys.py ====================================================================== The "colorsys" module defines bidirectional conversions of color values between colors expressed in the RGB (Red Green Blue) color space used in computer monitors and three other coordinate systems: YIQ, HLS (Hue Lightness Saturation) and HSV (Hue Saturation Value). Coordinates in all of these color spaces are floating-point values. In the YIQ space, the Y coordinate is between 0 and 1, but the I and Q coordinates can be positive or negative. In all other spaces, the coordinates are all between 0 and 1. See also: More information about color spaces can be found at https://poynton.ca/ColorFAQ.html and https://www.cambridgeincolour.com/tutorials/color-spaces.htm. The "colorsys" module defines the following functions: colorsys.rgb_to_yiq(r, g, b) Convert the color from RGB coordinates to YIQ coordinates. colorsys.yiq_to_rgb(y, i, q) Convert the color from YIQ coordinates to RGB coordinates. colorsys.rgb_to_hls(r, g, b) Convert the color from RGB coordinates to HLS coordinates. colorsys.hls_to_rgb(h, l, s) Convert the color from HLS coordinates to RGB coordinates. colorsys.rgb_to_hsv(r, g, b) Convert the color from RGB coordinates to HSV coordinates. colorsys.hsv_to_rgb(h, s, v) Convert the color from HSV coordinates to RGB coordinates. Example: >>> import colorsys >>> colorsys.rgb_to_hsv(0.2, 0.4, 0.4) (0.5, 0.5, 0.4) >>> colorsys.hsv_to_rgb(0.5, 0.5, 0.4) (0.2, 0.4, 0.4) "compileall" — Byte-compile Python libraries ******************************************** **Source code:** Lib/compileall.py ====================================================================== This module provides some utility functions to support installing Python libraries. These functions compile Python source files in a directory tree. This module can be used to create the cached byte-code files at library installation time, which makes them available for use even by users who don’t have write permission to the library directories. Availability: not WASI. This module does not work or is not available on WebAssembly. See WebAssembly platforms for more information. Command-line use ================ This module can work as a script (using **python -m compileall**) to compile Python sources. directory ... file ... Positional arguments are files to compile or directories that contain source files, traversed recursively. If no argument is given, behave as if the command line was "-l **". -l Do not recurse into subdirectories, only compile source code files directly contained in the named or implied directories. -f Force rebuild even if timestamps are up-to-date. -q Do not print the list of files compiled. If passed once, error messages will still be printed. If passed twice ("-qq"), all output is suppressed. -d destdir Directory prepended to the path to each file being compiled. This will appear in compilation time tracebacks, and is also compiled in to the byte-code file, where it will be used in tracebacks and other messages in cases where the source file does not exist at the time the byte-code file is executed. -s strip_prefix Remove the given prefix from paths recorded in the ".pyc" files. Paths are made relative to the prefix. This option can be used with "-p" but not with "-d". -p prepend_prefix Prepend the given prefix to paths recorded in the ".pyc" files. Use "-p /" to make the paths absolute. This option can be used with "-s" but not with "-d". -x regex regex is used to search the full path to each file considered for compilation, and if the regex produces a match, the file is skipped. -i list Read the file "list" and add each line that it contains to the list of files and directories to compile. If "list" is "-", read lines from "stdin". -b Write the byte-code files to their legacy locations and names, which may overwrite byte-code files created by another version of Python. The default is to write files to their **PEP 3147** locations and names, which allows byte-code files from multiple versions of Python to coexist. -r Control the maximum recursion level for subdirectories. If this is given, then "-l" option will not be taken into account. **python -m compileall -r 0** is equivalent to **python -m compileall -l**. -j N Use *N* workers to compile the files within the given directory. If "0" is used, then the result of "os.process_cpu_count()" will be used. --invalidation-mode [timestamp|checked-hash|unchecked-hash] Control how the generated byte-code files are invalidated at runtime. The "timestamp" value, means that ".pyc" files with the source timestamp and size embedded will be generated. The "checked- hash" and "unchecked-hash" values cause hash-based pycs to be generated. Hash-based pycs embed a hash of the source file contents rather than a timestamp. See Cached bytecode invalidation for more information on how Python validates bytecode cache files at runtime. The default is "timestamp" if the "SOURCE_DATE_EPOCH" environment variable is not set, and "checked-hash" if the "SOURCE_DATE_EPOCH" environment variable is set. -o level Compile with the given optimization level. May be used multiple times to compile for multiple levels at a time (for example, "compileall -o 1 -o 2"). -e dir Ignore symlinks pointing outside the given directory. --hardlink-dupes If two ".pyc" files with different optimization level have the same content, use hard links to consolidate duplicate files. Changed in version 3.2: Added the "-i", "-b" and "-h" options. Changed in version 3.5: Added the "-j", "-r", and "-qq" options. "-q" option was changed to a multilevel value. "-b" will always produce a byte-code file ending in ".pyc", never ".pyo". Changed in version 3.7: Added the "--invalidation-mode" option. Changed in version 3.9: Added the "-s", "-p", "-e" and "--hardlink- dupes" options. Raised the default recursion limit from 10 to "sys.getrecursionlimit()". Added the possibility to specify the "-o" option multiple times. There is no command-line option to control the optimization level used by the "compile()" function, because the Python interpreter itself already provides the option: **python -O -m compileall**. Similarly, the "compile()" function respects the "sys.pycache_prefix" setting. The generated bytecode cache will only be useful if "compile()" is run with the same "sys.pycache_prefix" (if any) that will be used at runtime. Public functions ================ compileall.compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, *, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False) Recursively descend the directory tree named by *dir*, compiling all ".py" files along the way. Return a true value if all the files compiled successfully, and a false value otherwise. The *maxlevels* parameter is used to limit the depth of the recursion; it defaults to "sys.getrecursionlimit()". If *ddir* is given, it is prepended to the path to each file being compiled for use in compilation time tracebacks, and is also compiled in to the byte-code file, where it will be used in tracebacks and other messages in cases where the source file does not exist at the time the byte-code file is executed. If *force* is true, modules are re-compiled even if the timestamps are up to date. If *rx* is given, its "search" method is called on the complete path to each file considered for compilation, and if it returns a true value, the file is skipped. This can be used to exclude files matching a regular expression, given as a re.Pattern object. If *quiet* is "False" or "0" (the default), the filenames and other information are printed to standard out. Set to "1", only errors are printed. Set to "2", all output is suppressed. If *legacy* is true, byte-code files are written to their legacy locations and names, which may overwrite byte-code files created by another version of Python. The default is to write files to their **PEP 3147** locations and names, which allows byte-code files from multiple versions of Python to coexist. *optimize* specifies the optimization level for the compiler. It is passed to the built-in "compile()" function. Accepts also a sequence of optimization levels which lead to multiple compilations of one ".py" file in one call. The argument *workers* specifies how many workers are used to compile files in parallel. The default is to not use multiple workers. If the platform can’t use multiple workers and *workers* argument is given, then sequential compilation will be used as a fallback. If *workers* is 0, the number of cores in the system is used. If *workers* is lower than "0", a "ValueError" will be raised. *invalidation_mode* should be a member of the "py_compile.PycInvalidationMode" enum and controls how the generated pycs are invalidated at runtime. The *stripdir*, *prependdir* and *limit_sl_dest* arguments correspond to the "-s", "-p" and "-e" options described above. They may be specified as "str" or "os.PathLike". If *hardlink_dupes* is true and two ".pyc" files with different optimization level have the same content, use hard links to consolidate duplicate files. Changed in version 3.2: Added the *legacy* and *optimize* parameter. Changed in version 3.5: Added the *workers* parameter. Changed in version 3.5: *quiet* parameter was changed to a multilevel value. Changed in version 3.5: The *legacy* parameter only writes out ".pyc" files, not ".pyo" files no matter what the value of *optimize* is. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.7: The *invalidation_mode* parameter was added. Changed in version 3.7.2: The *invalidation_mode* parameter’s default value is updated to "None". Changed in version 3.8: Setting *workers* to 0 now chooses the optimal number of cores. Changed in version 3.9: Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments. Default value of *maxlevels* was changed from "10" to "sys.getrecursionlimit()" compileall.compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, *, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False) Compile the file with path *fullname*. Return a true value if the file compiled successfully, and a false value otherwise. If *ddir* is given, it is prepended to the path to the file being compiled for use in compilation time tracebacks, and is also compiled in to the byte-code file, where it will be used in tracebacks and other messages in cases where the source file does not exist at the time the byte-code file is executed. If *rx* is given, its "search" method is passed the full path name to the file being compiled, and if it returns a true value, the file is not compiled and "True" is returned. This can be used to exclude files matching a regular expression, given as a re.Pattern object. If *quiet* is "False" or "0" (the default), the filenames and other information are printed to standard out. Set to "1", only errors are printed. Set to "2", all output is suppressed. If *legacy* is true, byte-code files are written to their legacy locations and names, which may overwrite byte-code files created by another version of Python. The default is to write files to their **PEP 3147** locations and names, which allows byte-code files from multiple versions of Python to coexist. *optimize* specifies the optimization level for the compiler. It is passed to the built-in "compile()" function. Accepts also a sequence of optimization levels which lead to multiple compilations of one ".py" file in one call. *invalidation_mode* should be a member of the "py_compile.PycInvalidationMode" enum and controls how the generated pycs are invalidated at runtime. The *stripdir*, *prependdir* and *limit_sl_dest* arguments correspond to the "-s", "-p" and "-e" options described above. They may be specified as "str" or "os.PathLike". If *hardlink_dupes* is true and two ".pyc" files with different optimization level have the same content, use hard links to consolidate duplicate files. Added in version 3.2. Changed in version 3.5: *quiet* parameter was changed to a multilevel value. Changed in version 3.5: The *legacy* parameter only writes out ".pyc" files, not ".pyo" files no matter what the value of *optimize* is. Changed in version 3.7: The *invalidation_mode* parameter was added. Changed in version 3.7.2: The *invalidation_mode* parameter’s default value is updated to "None". Changed in version 3.9: Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments. compileall.compile_path(skip_curdir=True, maxlevels=0, force=False, quiet=0, legacy=False, optimize=-1, invalidation_mode=None) Byte-compile all the ".py" files found along "sys.path". Return a true value if all the files compiled successfully, and a false value otherwise. If *skip_curdir* is true (the default), the current directory is not included in the search. All other parameters are passed to the "compile_dir()" function. Note that unlike the other compile functions, "maxlevels" defaults to "0". Changed in version 3.2: Added the *legacy* and *optimize* parameter. Changed in version 3.5: *quiet* parameter was changed to a multilevel value. Changed in version 3.5: The *legacy* parameter only writes out ".pyc" files, not ".pyo" files no matter what the value of *optimize* is. Changed in version 3.7: The *invalidation_mode* parameter was added. Changed in version 3.7.2: The *invalidation_mode* parameter’s default value is updated to "None". To force a recompile of all the ".py" files in the "Lib/" subdirectory and all its subdirectories: import compileall compileall.compile_dir('Lib/', force=True) # Perform same compilation, excluding files in .svn directories. import re compileall.compile_dir('Lib/', rx=re.compile(r'[/\\][.]svn'), force=True) # pathlib.Path objects can also be used. import pathlib compileall.compile_dir(pathlib.Path('Lib/'), force=True) See also: Module "py_compile" Byte-compile a single source file. Concurrent Execution ******************** The modules described in this chapter provide support for concurrent execution of code. The appropriate choice of tool will depend on the task to be executed (CPU bound vs IO bound) and preferred style of development (event driven cooperative multitasking vs preemptive multitasking). Here’s an overview: * "threading" — Thread-based parallelism * Introduction * GIL and performance considerations * Reference * Thread-local data * Thread objects * Lock objects * RLock objects * Condition objects * Semaphore objects * "Semaphore" example * Event objects * Timer objects * Barrier objects * Using locks, conditions, and semaphores in the "with" statement * "multiprocessing" — Process-based parallelism * Introduction * The "Process" class * Contexts and start methods * Exchanging objects between processes * Synchronization between processes * Sharing state between processes * Using a pool of workers * Reference * "Process" and exceptions * Pipes and Queues * Miscellaneous * Connection Objects * Synchronization primitives * Shared "ctypes" Objects * The "multiprocessing.sharedctypes" module * Managers * Customized managers * Using a remote manager * Proxy Objects * Cleanup * Process Pools * Listeners and Clients * Address Formats * Authentication keys * Logging * The "multiprocessing.dummy" module * Programming guidelines * All start methods * The *spawn* and *forkserver* start methods * Examples * "multiprocessing.shared_memory" — Shared memory for direct access across processes * The "concurrent" package * "concurrent.futures" — Launching parallel tasks * Executor Objects * ThreadPoolExecutor * ThreadPoolExecutor Example * ProcessPoolExecutor * ProcessPoolExecutor Example * Future Objects * Module Functions * Exception classes * "subprocess" — Subprocess management * Using the "subprocess" Module * Frequently Used Arguments * Popen Constructor * Exceptions * Security Considerations * Popen Objects * Windows Popen Helpers * Windows Constants * Older high-level API * Replacing Older Functions with the "subprocess" Module * Replacing **/bin/sh** shell command substitution * Replacing shell pipeline * Replacing "os.system()" * Replacing the "os.spawn" family * Replacing "os.popen()", "os.popen2()", "os.popen3()" * Replacing functions from the "popen2" module * Legacy Shell Invocation Functions * Notes * Timeout Behavior * Converting an argument sequence to a string on Windows * Disabling use of "vfork()" or "posix_spawn()" * "sched" — Event scheduler * Scheduler Objects * "queue" — A synchronized queue class * Queue Objects * Terminating queues * SimpleQueue Objects * "contextvars" — Context Variables * Context Variables * Manual Context Management * asyncio support The following are support modules for some of the above services: * "_thread" — Low-level threading API "concurrent.futures" — Launching parallel tasks *********************************************** Added in version 3.2. **Source code:** Lib/concurrent/futures/thread.py and Lib/concurrent/futures/process.py ====================================================================== The "concurrent.futures" module provides a high-level interface for asynchronously executing callables. The asynchronous execution can be performed with threads, using "ThreadPoolExecutor", or separate processes, using "ProcessPoolExecutor". Both implement the same interface, which is defined by the abstract "Executor" class. Availability: not WASI. This module does not work or is not available on WebAssembly. See WebAssembly platforms for more information. Executor Objects ================ class concurrent.futures.Executor An abstract class that provides methods to execute calls asynchronously. It should not be used directly, but through its concrete subclasses. submit(fn, /, *args, **kwargs) Schedules the callable, *fn*, to be executed as "fn(*args, **kwargs)" and returns a "Future" object representing the execution of the callable. with ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(pow, 323, 1235) print(future.result()) map(fn, *iterables, timeout=None, chunksize=1) Similar to "map(fn, *iterables)" except: * the *iterables* are collected immediately rather than lazily; * *fn* is executed asynchronously and several calls to *fn* may be made concurrently. The returned iterator raises a "TimeoutError" if "__next__()" is called and the result isn’t available after *timeout* seconds from the original call to "Executor.map()". *timeout* can be an int or a float. If *timeout* is not specified or "None", there is no limit to the wait time. If a *fn* call raises an exception, then that exception will be raised when its value is retrieved from the iterator. When using "ProcessPoolExecutor", this method chops *iterables* into a number of chunks which it submits to the pool as separate tasks. The (approximate) size of these chunks can be specified by setting *chunksize* to a positive integer. For very long iterables, using a large value for *chunksize* can significantly improve performance compared to the default size of 1. With "ThreadPoolExecutor", *chunksize* has no effect. Changed in version 3.5: Added the *chunksize* argument. shutdown(wait=True, *, cancel_futures=False) Signal the executor that it should free any resources that it is using when the currently pending futures are done executing. Calls to "Executor.submit()" and "Executor.map()" made after shutdown will raise "RuntimeError". If *wait* is "True" then this method will not return until all the pending futures are done executing and the resources associated with the executor have been freed. If *wait* is "False" then this method will return immediately and the resources associated with the executor will be freed when all pending futures are done executing. Regardless of the value of *wait*, the entire Python program will not exit until all pending futures are done executing. If *cancel_futures* is "True", this method will cancel all pending futures that the executor has not started running. Any futures that are completed or running won’t be cancelled, regardless of the value of *cancel_futures*. If both *cancel_futures* and *wait* are "True", all futures that the executor has started running will be completed prior to this method returning. The remaining futures are cancelled. You can avoid having to call this method explicitly if you use the "with" statement, which will shutdown the "Executor" (waiting as if "Executor.shutdown()" were called with *wait* set to "True"): import shutil with ThreadPoolExecutor(max_workers=4) as e: e.submit(shutil.copy, 'src1.txt', 'dest1.txt') e.submit(shutil.copy, 'src2.txt', 'dest2.txt') e.submit(shutil.copy, 'src3.txt', 'dest3.txt') e.submit(shutil.copy, 'src4.txt', 'dest4.txt') Changed in version 3.9: Added *cancel_futures*. ThreadPoolExecutor ================== "ThreadPoolExecutor" is an "Executor" subclass that uses a pool of threads to execute calls asynchronously. Deadlocks can occur when the callable associated with a "Future" waits on the results of another "Future". For example: import time def wait_on_b(): time.sleep(5) print(b.result()) # b will never complete because it is waiting on a. return 5 def wait_on_a(): time.sleep(5) print(a.result()) # a will never complete because it is waiting on b. return 6 executor = ThreadPoolExecutor(max_workers=2) a = executor.submit(wait_on_b) b = executor.submit(wait_on_a) And: def wait_on_future(): f = executor.submit(pow, 5, 2) # This will never complete because there is only one worker thread and # it is executing this function. print(f.result()) executor = ThreadPoolExecutor(max_workers=1) executor.submit(wait_on_future) class concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='', initializer=None, initargs=()) An "Executor" subclass that uses a pool of at most *max_workers* threads to execute calls asynchronously. All threads enqueued to "ThreadPoolExecutor" will be joined before the interpreter can exit. Note that the exit handler which does this is executed *before* any exit handlers added using "atexit". This means exceptions in the main thread must be caught and handled in order to signal threads to exit gracefully. For this reason, it is recommended that "ThreadPoolExecutor" not be used for long- running tasks. *initializer* is an optional callable that is called at the start of each worker thread; *initargs* is a tuple of arguments passed to the initializer. Should *initializer* raise an exception, all currently pending jobs will raise a "BrokenThreadPool", as well as any attempt to submit more jobs to the pool. Changed in version 3.5: If *max_workers* is "None" or not given, it will default to the number of processors on the machine, multiplied by "5", assuming that "ThreadPoolExecutor" is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for "ProcessPoolExecutor". Changed in version 3.6: Added the *thread_name_prefix* parameter to allow users to control the "threading.Thread" names for worker threads created by the pool for easier debugging. Changed in version 3.7: Added the *initializer* and *initargs* arguments. Changed in version 3.8: Default value of *max_workers* is changed to "min(32, os.cpu_count() + 4)". This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.ThreadPoolExecutor now reuses idle worker threads before starting *max_workers* worker threads too. Changed in version 3.13: Default value of *max_workers* is changed to "min(32, (os.process_cpu_count() or 1) + 4)". ThreadPoolExecutor Example -------------------------- import concurrent.futures import urllib.request URLS = ['http://www.foxnews.com/', 'http://www.cnn.com/', 'http://europe.wsj.com/', 'http://www.bbc.co.uk/', 'http://nonexistent-subdomain.python.org/'] # Retrieve a single page and report the URL and contents def load_url(url, timeout): with urllib.request.urlopen(url, timeout=timeout) as conn: return conn.read() # We can use a with statement to ensure threads are cleaned up promptly with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # Start the load operations and mark each future with its URL future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: print('%r generated an exception: %s' % (url, exc)) else: print('%r page is %d bytes' % (url, len(data))) ProcessPoolExecutor =================== The "ProcessPoolExecutor" class is an "Executor" subclass that uses a pool of processes to execute calls asynchronously. "ProcessPoolExecutor" uses the "multiprocessing" module, which allows it to side-step the *Global Interpreter Lock* but also means that only picklable objects can be executed and returned. The "__main__" module must be importable by worker subprocesses. This means that "ProcessPoolExecutor" will not work in the interactive interpreter. Calling "Executor" or "Future" methods from a callable submitted to a "ProcessPoolExecutor" will result in deadlock. class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None, initializer=None, initargs=(), max_tasks_per_child=None) An "Executor" subclass that executes calls asynchronously using a pool of at most *max_workers* processes. If *max_workers* is "None" or not given, it will default to "os.process_cpu_count()". If *max_workers* is less than or equal to "0", then a "ValueError" will be raised. On Windows, *max_workers* must be less than or equal to "61". If it is not then "ValueError" will be raised. If *max_workers* is "None", then the default chosen will be at most "61", even if more processors are available. *mp_context* can be a "multiprocessing" context or "None". It will be used to launch the workers. If *mp_context* is "None" or not given, the default "multiprocessing" context is used. See Contexts and start methods. *initializer* is an optional callable that is called at the start of each worker process; *initargs* is a tuple of arguments passed to the initializer. Should *initializer* raise an exception, all currently pending jobs will raise a "BrokenProcessPool", as well as any attempt to submit more jobs to the pool. *max_tasks_per_child* is an optional argument that specifies the maximum number of tasks a single process can execute before it will exit and be replaced with a fresh worker process. By default *max_tasks_per_child* is "None" which means worker processes will live as long as the pool. When a max is specified, the “spawn” multiprocessing start method will be used by default in absence of a *mp_context* parameter. This feature is incompatible with the “fork” start method. Changed in version 3.3: When one of the worker processes terminates abruptly, a "BrokenProcessPool" error is now raised. Previously, behaviour was undefined but operations on the executor or its futures would often freeze or deadlock. Changed in version 3.7: The *mp_context* argument was added to allow users to control the start_method for worker processes created by the pool.Added the *initializer* and *initargs* arguments. Note: The default "multiprocessing" start method (see Contexts and start methods) will change away from *fork* in Python 3.14. Code that requires *fork* be used for their "ProcessPoolExecutor" should explicitly specify that by passing a "mp_context=multiprocessing.get_context("fork")" parameter. Changed in version 3.11: The *max_tasks_per_child* argument was added to allow users to control the lifetime of workers in the pool. Changed in version 3.12: On POSIX systems, if your application has multiple threads and the "multiprocessing" context uses the ""fork"" start method: The "os.fork()" function called internally to spawn workers may raise a "DeprecationWarning". Pass a *mp_context* configured to use a different start method. See the "os.fork()" documentation for further explanation. Changed in version 3.13: *max_workers* uses "os.process_cpu_count()" by default, instead of "os.cpu_count()". ProcessPoolExecutor Example --------------------------- import concurrent.futures import math PRIMES = [ 112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419] def is_prime(n): if n < 2: return False if n == 2: return True if n % 2 == 0: return False sqrt_n = int(math.floor(math.sqrt(n))) for i in range(3, sqrt_n + 1, 2): if n % i == 0: return False return True def main(): with concurrent.futures.ProcessPoolExecutor() as executor: for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)): print('%d is prime: %s' % (number, prime)) if __name__ == '__main__': main() Future Objects ============== The "Future" class encapsulates the asynchronous execution of a callable. "Future" instances are created by "Executor.submit()". class concurrent.futures.Future Encapsulates the asynchronous execution of a callable. "Future" instances are created by "Executor.submit()" and should not be created directly except for testing. cancel() Attempt to cancel the call. If the call is currently being executed or finished running and cannot be cancelled then the method will return "False", otherwise the call will be cancelled and the method will return "True". cancelled() Return "True" if the call was successfully cancelled. running() Return "True" if the call is currently being executed and cannot be cancelled. done() Return "True" if the call was successfully cancelled or finished running. result(timeout=None) Return the value returned by the call. If the call hasn’t yet completed then this method will wait up to *timeout* seconds. If the call hasn’t completed in *timeout* seconds, then a "TimeoutError" will be raised. *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. If the future is cancelled before completing then "CancelledError" will be raised. If the call raised an exception, this method will raise the same exception. exception(timeout=None) Return the exception raised by the call. If the call hasn’t yet completed then this method will wait up to *timeout* seconds. If the call hasn’t completed in *timeout* seconds, then a "TimeoutError" will be raised. *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. If the future is cancelled before completing then "CancelledError" will be raised. If the call completed without raising, "None" is returned. add_done_callback(fn) Attaches the callable *fn* to the future. *fn* will be called, with the future as its only argument, when the future is cancelled or finishes running. Added callables are called in the order that they were added and are always called in a thread belonging to the process that added them. If the callable raises an "Exception" subclass, it will be logged and ignored. If the callable raises a "BaseException" subclass, the behavior is undefined. If the future has already completed or been cancelled, *fn* will be called immediately. The following "Future" methods are meant for use in unit tests and "Executor" implementations. set_running_or_notify_cancel() This method should only be called by "Executor" implementations before executing the work associated with the "Future" and by unit tests. If the method returns "False" then the "Future" was cancelled, i.e. "Future.cancel()" was called and returned "True". Any threads waiting on the "Future" completing (i.e. through "as_completed()" or "wait()") will be woken up. If the method returns "True" then the "Future" was not cancelled and has been put in the running state, i.e. calls to "Future.running()" will return "True". This method can only be called once and cannot be called after "Future.set_result()" or "Future.set_exception()" have been called. set_result(result) Sets the result of the work associated with the "Future" to *result*. This method should only be used by "Executor" implementations and unit tests. Changed in version 3.8: This method raises "concurrent.futures.InvalidStateError" if the "Future" is already done. set_exception(exception) Sets the result of the work associated with the "Future" to the "Exception" *exception*. This method should only be used by "Executor" implementations and unit tests. Changed in version 3.8: This method raises "concurrent.futures.InvalidStateError" if the "Future" is already done. Module Functions ================ concurrent.futures.wait(fs, timeout=None, return_when=ALL_COMPLETED) Wait for the "Future" instances (possibly created by different "Executor" instances) given by *fs* to complete. Duplicate futures given to *fs* are removed and will be returned only once. Returns a named 2-tuple of sets. The first set, named "done", contains the futures that completed (finished or cancelled futures) before the wait completed. The second set, named "not_done", contains the futures that did not complete (pending or running futures). *timeout* can be used to control the maximum number of seconds to wait before returning. *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. *return_when* indicates when this function should return. It must be one of the following constants: +----------------------------------------------------+----------------------------------------------------+ | Constant | Description | |====================================================|====================================================| | concurrent.futures.FIRST_COMPLETED | The function will return when any future finishes | | | or is cancelled. | +----------------------------------------------------+----------------------------------------------------+ | concurrent.futures.FIRST_EXCEPTION | The function will return when any future finishes | | | by raising an exception. If no future raises an | | | exception then it is equivalent to | | | "ALL_COMPLETED". | +----------------------------------------------------+----------------------------------------------------+ | concurrent.futures.ALL_COMPLETED | The function will return when all futures finish | | | or are cancelled. | +----------------------------------------------------+----------------------------------------------------+ concurrent.futures.as_completed(fs, timeout=None) Returns an iterator over the "Future" instances (possibly created by different "Executor" instances) given by *fs* that yields futures as they complete (finished or cancelled futures). Any futures given by *fs* that are duplicated will be returned once. Any futures that completed before "as_completed()" is called will be yielded first. The returned iterator raises a "TimeoutError" if "__next__()" is called and the result isn’t available after *timeout* seconds from the original call to "as_completed()". *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. See also: **PEP 3148** – futures - execute computations asynchronously The proposal which described this feature for inclusion in the Python standard library. Exception classes ================= exception concurrent.futures.CancelledError Raised when a future is cancelled. exception concurrent.futures.TimeoutError A deprecated alias of "TimeoutError", raised when a future operation exceeds the given timeout. Changed in version 3.11: This class was made an alias of "TimeoutError". exception concurrent.futures.BrokenExecutor Derived from "RuntimeError", this exception class is raised when an executor is broken for some reason, and cannot be used to submit or execute new tasks. Added in version 3.7. exception concurrent.futures.InvalidStateError Raised when an operation is performed on a future that is not allowed in the current state. Added in version 3.8. exception concurrent.futures.thread.BrokenThreadPool Derived from "BrokenExecutor", this exception class is raised when one of the workers of a "ThreadPoolExecutor" has failed initializing. Added in version 3.7. exception concurrent.futures.process.BrokenProcessPool Derived from "BrokenExecutor" (formerly "RuntimeError"), this exception class is raised when one of the workers of a "ProcessPoolExecutor" has terminated in a non-clean fashion (for example, if it was killed from the outside). Added in version 3.3. The "concurrent" package ************************ Currently, there is only one module in this package: * "concurrent.futures" – Launching parallel tasks "configparser" — Configuration file parser ****************************************** **Source code:** Lib/configparser.py ====================================================================== This module provides the "ConfigParser" class which implements a basic configuration language which provides a structure similar to what’s found in Microsoft Windows INI files. You can use this to write Python programs which can be customized by end users easily. Note: This library does *not* interpret or write the value-type prefixes used in the Windows Registry extended version of INI syntax. See also: Module "tomllib" TOML is a well-specified format for application configuration files. It is specifically designed to be an improved version of INI. Module "shlex" Support for creating Unix shell-like mini-languages which can also be used for application configuration files. Module "json" The "json" module implements a subset of JavaScript syntax which is sometimes used for configuration, but does not support comments. Quick Start =========== Let’s take a very basic configuration file that looks like this: [DEFAULT] ServerAliveInterval = 45 Compression = yes CompressionLevel = 9 ForwardX11 = yes [forge.example] User = hg [topsecret.server.example] Port = 50022 ForwardX11 = no The structure of INI files is described in the following section. Essentially, the file consists of sections, each of which contains keys with values. "configparser" classes can read and write such files. Let’s start by creating the above configuration file programmatically. >>> import configparser >>> config = configparser.ConfigParser() >>> config['DEFAULT'] = {'ServerAliveInterval': '45', ... 'Compression': 'yes', ... 'CompressionLevel': '9'} >>> config['forge.example'] = {} >>> config['forge.example']['User'] = 'hg' >>> config['topsecret.server.example'] = {} >>> topsecret = config['topsecret.server.example'] >>> topsecret['Port'] = '50022' # mutates the parser >>> topsecret['ForwardX11'] = 'no' # same here >>> config['DEFAULT']['ForwardX11'] = 'yes' >>> with open('example.ini', 'w') as configfile: ... config.write(configfile) ... As you can see, we can treat a config parser much like a dictionary. There are differences, outlined later, but the behavior is very close to what you would expect from a dictionary. Now that we have created and saved a configuration file, let’s read it back and explore the data it holds. >>> config = configparser.ConfigParser() >>> config.sections() [] >>> config.read('example.ini') ['example.ini'] >>> config.sections() ['forge.example', 'topsecret.server.example'] >>> 'forge.example' in config True >>> 'python.org' in config False >>> config['forge.example']['User'] 'hg' >>> config['DEFAULT']['Compression'] 'yes' >>> topsecret = config['topsecret.server.example'] >>> topsecret['ForwardX11'] 'no' >>> topsecret['Port'] '50022' >>> for key in config['forge.example']: ... print(key) user compressionlevel serveraliveinterval compression forwardx11 >>> config['forge.example']['ForwardX11'] 'yes' As we can see above, the API is pretty straightforward. The only bit of magic involves the "DEFAULT" section which provides default values for all other sections [1]. Note also that keys in sections are case- insensitive and stored in lowercase [1]. It is possible to read several configurations into a single "ConfigParser", where the most recently added configuration has the highest priority. Any conflicting keys are taken from the more recent configuration while the previously existing keys are retained. The example below reads in an "override.ini" file, which will override any conflicting keys from the "example.ini" file. [DEFAULT] ServerAliveInterval = -1 >>> config_override = configparser.ConfigParser() >>> config_override['DEFAULT'] = {'ServerAliveInterval': '-1'} >>> with open('override.ini', 'w') as configfile: ... config_override.write(configfile) ... >>> config_override = configparser.ConfigParser() >>> config_override.read(['example.ini', 'override.ini']) ['example.ini', 'override.ini'] >>> print(config_override.get('DEFAULT', 'ServerAliveInterval')) -1 This behaviour is equivalent to a "ConfigParser.read()" call with several files passed to the *filenames* parameter. Supported Datatypes =================== Config parsers do not guess datatypes of values in configuration files, always storing them internally as strings. This means that if you need other datatypes, you should convert on your own: >>> int(topsecret['Port']) 50022 >>> float(topsecret['CompressionLevel']) 9.0 Since this task is so common, config parsers provide a range of handy getter methods to handle integers, floats and booleans. The last one is the most interesting because simply passing the value to "bool()" would do no good since "bool('False')" is still "True". This is why config parsers also provide "getboolean()". This method is case- insensitive and recognizes Boolean values from "'yes'"/"'no'", "'on'"/"'off'", "'true'"/"'false'" and "'1'"/"'0'" [1]. For example: >>> topsecret.getboolean('ForwardX11') False >>> config['forge.example'].getboolean('ForwardX11') True >>> config.getboolean('forge.example', 'Compression') True Apart from "getboolean()", config parsers also provide equivalent "getint()" and "getfloat()" methods. You can register your own converters and customize the provided ones. [1] Fallback Values =============== As with a dictionary, you can use a section’s "get()" method to provide fallback values: >>> topsecret.get('Port') '50022' >>> topsecret.get('CompressionLevel') '9' >>> topsecret.get('Cipher') >>> topsecret.get('Cipher', '3des-cbc') '3des-cbc' Please note that default values have precedence over fallback values. For instance, in our example the "'CompressionLevel'" key was specified only in the "'DEFAULT'" section. If we try to get it from the section "'topsecret.server.example'", we will always get the default, even if we specify a fallback: >>> topsecret.get('CompressionLevel', '3') '9' One more thing to be aware of is that the parser-level "get()" method provides a custom, more complex interface, maintained for backwards compatibility. When using this method, a fallback value can be provided via the "fallback" keyword-only argument: >>> config.get('forge.example', 'monster', ... fallback='No such things as monsters') 'No such things as monsters' The same "fallback" argument can be used with the "getint()", "getfloat()" and "getboolean()" methods, for example: >>> 'BatchMode' in topsecret False >>> topsecret.getboolean('BatchMode', fallback=True) True >>> config['DEFAULT']['BatchMode'] = 'no' >>> topsecret.getboolean('BatchMode', fallback=True) False Supported INI File Structure ============================ A configuration file consists of sections, each led by a "[section]" header, followed by key/value entries separated by a specific string ("=" or ":" by default [1]). By default, section names are case sensitive but keys are not [1]. Leading and trailing whitespace is removed from keys and values. Values can be omitted if the parser is configured to allow it [1], in which case the key/value delimiter may also be left out. Values can also span multiple lines, as long as they are indented deeper than the first line of the value. Depending on the parser’s mode, blank lines may be treated as parts of multiline values or ignored. By default, a valid section name can be any string that does not contain ‘\n’. To change this, see "ConfigParser.SECTCRE". The first section name may be omitted if the parser is configured to allow an unnamed top level section with "allow_unnamed_section=True". In this case, the keys/values may be retrieved by "UNNAMED_SECTION" as in "config[UNNAMED_SECTION]". Configuration files may include comments, prefixed by specific characters ("#" and ";" by default [1]). Comments may appear on their own on an otherwise empty line, possibly indented. [1] For example: [Simple Values] key=value spaces in keys=allowed spaces in values=allowed as well spaces around the delimiter = obviously you can also use : to delimit keys from values [All Values Are Strings] values like this: 1000000 or this: 3.14159265359 are they treated as numbers? : no integers, floats and booleans are held as: strings can use the API to get converted values directly: true [Multiline Values] chorus: I'm a lumberjack, and I'm okay I sleep all night and I work all day [No Values] key_without_value empty string value here = [You can use comments] # like this ; or this # By default only in an empty line. # Inline comments can be harmful because they prevent users # from using the delimiting characters as parts of values. # That being said, this can be customized. [Sections Can Be Indented] can_values_be_as_well = True does_that_mean_anything_special = False purpose = formatting for readability multiline_values = are handled just fine as long as they are indented deeper than the first line of a value # Did I mention we can indent comments, too? Unnamed Sections ================ The name of the first section (or unique) may be omitted and values retrieved by the "UNNAMED_SECTION" attribute. >>> config = """ ... option = value ... ... [ Section 2 ] ... another = val ... """ >>> unnamed = configparser.ConfigParser(allow_unnamed_section=True) >>> unnamed.read_string(config) >>> unnamed.get(configparser.UNNAMED_SECTION, 'option') 'value' Interpolation of values ======================= On top of the core functionality, "ConfigParser" supports interpolation. This means values can be preprocessed before returning them from "get()" calls. class configparser.BasicInterpolation The default implementation used by "ConfigParser". It enables values to contain format strings which refer to other values in the same section, or values in the special default section [1]. Additional default values can be provided on initialization. For example: [Paths] home_dir: /Users my_dir: %(home_dir)s/lumberjack my_pictures: %(my_dir)s/Pictures [Escape] # use a %% to escape the % sign (% is the only character that needs to be escaped): gain: 80%% In the example above, "ConfigParser" with *interpolation* set to "BasicInterpolation()" would resolve "%(home_dir)s" to the value of "home_dir" ("/Users" in this case). "%(my_dir)s" in effect would resolve to "/Users/lumberjack". All interpolations are done on demand so keys used in the chain of references do not have to be specified in any specific order in the configuration file. With "interpolation" set to "None", the parser would simply return "%(my_dir)s/Pictures" as the value of "my_pictures" and "%(home_dir)s/lumberjack" as the value of "my_dir". class configparser.ExtendedInterpolation An alternative handler for interpolation which implements a more advanced syntax, used for instance in "zc.buildout". Extended interpolation is using "${section:option}" to denote a value from a foreign section. Interpolation can span multiple levels. For convenience, if the "section:" part is omitted, interpolation defaults to the current section (and possibly the default values from the special section). For example, the configuration specified above with basic interpolation, would look like this with extended interpolation: [Paths] home_dir: /Users my_dir: ${home_dir}/lumberjack my_pictures: ${my_dir}/Pictures [Escape] # use a $$ to escape the $ sign ($ is the only character that needs to be escaped): cost: $$80 Values from other sections can be fetched as well: [Common] home_dir: /Users library_dir: /Library system_dir: /System macports_dir: /opt/local [Frameworks] Python: 3.2 path: ${Common:system_dir}/Library/Frameworks/ [Arthur] nickname: Two Sheds last_name: Jackson my_dir: ${Common:home_dir}/twosheds my_pictures: ${my_dir}/Pictures python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python} Mapping Protocol Access ======================= Added in version 3.2. Mapping protocol access is a generic name for functionality that enables using custom objects as if they were dictionaries. In case of "configparser", the mapping interface implementation is using the "parser['section']['option']" notation. "parser['section']" in particular returns a proxy for the section’s data in the parser. This means that the values are not copied but they are taken from the original parser on demand. What’s even more important is that when values are changed on a section proxy, they are actually mutated in the original parser. "configparser" objects behave as close to actual dictionaries as possible. The mapping interface is complete and adheres to the "MutableMapping" ABC. However, there are a few differences that should be taken into account: * By default, all keys in sections are accessible in a case- insensitive manner [1]. E.g. "for option in parser["section"]" yields only "optionxform"’ed option key names. This means lowercased keys by default. At the same time, for a section that holds the key "'a'", both expressions return "True": "a" in parser["section"] "A" in parser["section"] * All sections include "DEFAULTSECT" values as well which means that ".clear()" on a section may not leave the section visibly empty. This is because default values cannot be deleted from the section (because technically they are not there). If they are overridden in the section, deleting causes the default value to be visible again. Trying to delete a default value causes a "KeyError". * "DEFAULTSECT" cannot be removed from the parser: * trying to delete it raises "ValueError", * "parser.clear()" leaves it intact, * "parser.popitem()" never returns it. * "parser.get(section, option, **kwargs)" - the second argument is **not** a fallback value. Note however that the section-level "get()" methods are compatible both with the mapping protocol and the classic configparser API. * "parser.items()" is compatible with the mapping protocol (returns a list of *section_name*, *section_proxy* pairs including the DEFAULTSECT). However, this method can also be invoked with arguments: "parser.items(section, raw, vars)". The latter call returns a list of *option*, *value* pairs for a specified "section", with all interpolations expanded (unless "raw=True" is provided). The mapping protocol is implemented on top of the existing legacy API so that subclasses overriding the original interface still should have mappings working as expected. Customizing Parser Behaviour ============================ There are nearly as many INI format variants as there are applications using it. "configparser" goes a long way to provide support for the largest sensible set of INI styles available. The default functionality is mainly dictated by historical background and it’s very likely that you will want to customize some of the features. The most common way to change the way a specific config parser works is to use the "__init__()" options: * *defaults*, default value: "None" This option accepts a dictionary of key-value pairs which will be initially put in the "DEFAULT" section. This makes for an elegant way to support concise configuration files that don’t specify values which are the same as the documented default. Hint: if you want to specify default values for a specific section, use "read_dict()" before you read the actual file. * *dict_type*, default value: "dict" This option has a major impact on how the mapping protocol will behave and how the written configuration files look. With the standard dictionary, every section is stored in the order they were added to the parser. Same goes for options within sections. An alternative dictionary type can be used for example to sort sections and options on write-back. Please note: there are ways to add a set of key-value pairs in a single operation. When you use a regular dictionary in those operations, the order of the keys will be ordered. For example: >>> parser = configparser.ConfigParser() >>> parser.read_dict({'section1': {'key1': 'value1', ... 'key2': 'value2', ... 'key3': 'value3'}, ... 'section2': {'keyA': 'valueA', ... 'keyB': 'valueB', ... 'keyC': 'valueC'}, ... 'section3': {'foo': 'x', ... 'bar': 'y', ... 'baz': 'z'} ... }) >>> parser.sections() ['section1', 'section2', 'section3'] >>> [option for option in parser['section3']] ['foo', 'bar', 'baz'] * *allow_no_value*, default value: "False" Some configuration files are known to include settings without values, but which otherwise conform to the syntax supported by "configparser". The *allow_no_value* parameter to the constructor can be used to indicate that such values should be accepted: >>> import configparser >>> sample_config = """ ... [mysqld] ... user = mysql ... pid-file = /var/run/mysqld/mysqld.pid ... skip-external-locking ... old_passwords = 1 ... skip-bdb ... # we don't need ACID today ... skip-innodb ... """ >>> config = configparser.ConfigParser(allow_no_value=True) >>> config.read_string(sample_config) >>> # Settings with values are treated as before: >>> config["mysqld"]["user"] 'mysql' >>> # Settings without values provide None: >>> config["mysqld"]["skip-bdb"] >>> # Settings which aren't specified still raise an error: >>> config["mysqld"]["does-not-exist"] Traceback (most recent call last): ... KeyError: 'does-not-exist' * *delimiters*, default value: "('=', ':')" Delimiters are substrings that delimit keys from values within a section. The first occurrence of a delimiting substring on a line is considered a delimiter. This means values (but not keys) can contain the delimiters. See also the *space_around_delimiters* argument to "ConfigParser.write()". * *comment_prefixes*, default value: "('#', ';')" * *inline_comment_prefixes*, default value: "None" Comment prefixes are strings that indicate the start of a valid comment within a config file. *comment_prefixes* are used only on otherwise empty lines (optionally indented) whereas *inline_comment_prefixes* can be used after every valid value (e.g. section names, options and empty lines as well). By default inline comments are disabled and "'#'" and "';'" are used as prefixes for whole line comments. Changed in version 3.2: In previous versions of "configparser" behaviour matched "comment_prefixes=('#',';')" and "inline_comment_prefixes=(';',)". Please note that config parsers don’t support escaping of comment prefixes so using *inline_comment_prefixes* may prevent users from specifying option values with characters used as comment prefixes. When in doubt, avoid setting *inline_comment_prefixes*. In any circumstances, the only way of storing comment prefix characters at the beginning of a line in multiline values is to interpolate the prefix, for example: >>> from configparser import ConfigParser, ExtendedInterpolation >>> parser = ConfigParser(interpolation=ExtendedInterpolation()) >>> # the default BasicInterpolation could be used as well >>> parser.read_string(""" ... [DEFAULT] ... hash = # ... ... [hashes] ... shebang = ... ${hash}!/usr/bin/env python ... ${hash} -*- coding: utf-8 -*- ... ... extensions = ... enabled_extension ... another_extension ... #disabled_by_comment ... yet_another_extension ... ... interpolation not necessary = if # is not at line start ... even in multiline values = line #1 ... line #2 ... line #3 ... """) >>> print(parser['hashes']['shebang']) #!/usr/bin/env python # -*- coding: utf-8 -*- >>> print(parser['hashes']['extensions']) enabled_extension another_extension yet_another_extension >>> print(parser['hashes']['interpolation not necessary']) if # is not at line start >>> print(parser['hashes']['even in multiline values']) line #1 line #2 line #3 * *strict*, default value: "True" When set to "True", the parser will not allow for any section or option duplicates while reading from a single source (using "read_file()", "read_string()" or "read_dict()"). It is recommended to use strict parsers in new applications. Changed in version 3.2: In previous versions of "configparser" behaviour matched "strict=False". * *empty_lines_in_values*, default value: "True" In config parsers, values can span multiple lines as long as they are indented more than the key that holds them. By default parsers also let empty lines to be parts of values. At the same time, keys can be arbitrarily indented themselves to improve readability. In consequence, when configuration files get big and complex, it is easy for the user to lose track of the file structure. Take for instance: [Section] key = multiline value with a gotcha this = is still a part of the multiline value of 'key' This can be especially problematic for the user to see if she’s using a proportional font to edit the file. That is why when your application does not need values with empty lines, you should consider disallowing them. This will make empty lines split keys every time. In the example above, it would produce two keys, "key" and "this". * *default_section*, default value: "configparser.DEFAULTSECT" (that is: ""DEFAULT"") The convention of allowing a special section of default values for other sections or interpolation purposes is a powerful concept of this library, letting users create complex declarative configurations. This section is normally called ""DEFAULT"" but this can be customized to point to any other valid section name. Some typical values include: ""general"" or ""common"". The name provided is used for recognizing default sections when reading from any source and is used when writing configuration back to a file. Its current value can be retrieved using the "parser_instance.default_section" attribute and may be modified at runtime (i.e. to convert files from one format to another). * *interpolation*, default value: "configparser.BasicInterpolation" Interpolation behaviour may be customized by providing a custom handler through the *interpolation* argument. "None" can be used to turn off interpolation completely, "ExtendedInterpolation()" provides a more advanced variant inspired by "zc.buildout". More on the subject in the dedicated documentation section. "RawConfigParser" has a default value of "None". * *converters*, default value: not set Config parsers provide option value getters that perform type conversion. By default "getint()", "getfloat()", and "getboolean()" are implemented. Should other getters be desirable, users may define them in a subclass or pass a dictionary where each key is a name of the converter and each value is a callable implementing said conversion. For instance, passing "{'decimal': decimal.Decimal}" would add "getdecimal()" on both the parser object and all section proxies. In other words, it will be possible to write both "parser_instance.getdecimal('section', 'key', fallback=0)" and "parser_instance['section'].getdecimal('key', 0)". If the converter needs to access the state of the parser, it can be implemented as a method on a config parser subclass. If the name of this method starts with "get", it will be available on all section proxies, in the dict-compatible form (see the "getdecimal()" example above). More advanced customization may be achieved by overriding default values of these parser attributes. The defaults are defined on the classes, so they may be overridden by subclasses or by attribute assignment. ConfigParser.BOOLEAN_STATES By default when using "getboolean()", config parsers consider the following values "True": "'1'", "'yes'", "'true'", "'on'" and the following values "False": "'0'", "'no'", "'false'", "'off'". You can override this by specifying a custom dictionary of strings and their Boolean outcomes. For example: >>> custom = configparser.ConfigParser() >>> custom['section1'] = {'funky': 'nope'} >>> custom['section1'].getboolean('funky') Traceback (most recent call last): ... ValueError: Not a boolean: nope >>> custom.BOOLEAN_STATES = {'sure': True, 'nope': False} >>> custom['section1'].getboolean('funky') False Other typical Boolean pairs include "accept"/"reject" or "enabled"/"disabled". ConfigParser.optionxform(option) This method transforms option names on every read, get, or set operation. The default converts the name to lowercase. This also means that when a configuration file gets written, all keys will be lowercase. Override this method if that’s unsuitable. For example: >>> config = """ ... [Section1] ... Key = Value ... ... [Section2] ... AnotherKey = Value ... """ >>> typical = configparser.ConfigParser() >>> typical.read_string(config) >>> list(typical['Section1'].keys()) ['key'] >>> list(typical['Section2'].keys()) ['anotherkey'] >>> custom = configparser.RawConfigParser() >>> custom.optionxform = lambda option: option >>> custom.read_string(config) >>> list(custom['Section1'].keys()) ['Key'] >>> list(custom['Section2'].keys()) ['AnotherKey'] Note: The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged. ConfigParser.SECTCRE A compiled regular expression used to parse section headers. The default matches "[section]" to the name ""section"". Whitespace is considered part of the section name, thus "[ larch ]" will be read as a section of name "" larch "". Override this attribute if that’s unsuitable. For example: >>> import re >>> config = """ ... [Section 1] ... option = value ... ... [ Section 2 ] ... another = val ... """ >>> typical = configparser.ConfigParser() >>> typical.read_string(config) >>> typical.sections() ['Section 1', ' Section 2 '] >>> custom = configparser.ConfigParser() >>> custom.SECTCRE = re.compile(r"\[ *(?P
[^]]+?) *\]") >>> custom.read_string(config) >>> custom.sections() ['Section 1', 'Section 2'] Note: While ConfigParser objects also use an "OPTCRE" attribute for recognizing option lines, it’s not recommended to override it because that would interfere with constructor options *allow_no_value* and *delimiters*. Legacy API Examples =================== Mainly because of backwards compatibility concerns, "configparser" provides also a legacy API with explicit "get"/"set" methods. While there are valid use cases for the methods outlined below, mapping protocol access is preferred for new projects. The legacy API is at times more advanced, low-level and downright counterintuitive. An example of writing to a configuration file: import configparser config = configparser.RawConfigParser() # Please note that using RawConfigParser's set functions, you can assign # non-string values to keys internally, but will receive an error when # attempting to write to a file or when you get it in non-raw mode. Setting # values using the mapping protocol or ConfigParser's set() does not allow # such assignments to take place. config.add_section('Section1') config.set('Section1', 'an_int', '15') config.set('Section1', 'a_bool', 'true') config.set('Section1', 'a_float', '3.1415') config.set('Section1', 'baz', 'fun') config.set('Section1', 'bar', 'Python') config.set('Section1', 'foo', '%(bar)s is %(baz)s!') # Writing our configuration file to 'example.cfg' with open('example.cfg', 'w') as configfile: config.write(configfile) An example of reading the configuration file again: import configparser config = configparser.RawConfigParser() config.read('example.cfg') # getfloat() raises an exception if the value is not a float # getint() and getboolean() also do this for their respective types a_float = config.getfloat('Section1', 'a_float') an_int = config.getint('Section1', 'an_int') print(a_float + an_int) # Notice that the next output does not interpolate '%(bar)s' or '%(baz)s'. # This is because we are using a RawConfigParser(). if config.getboolean('Section1', 'a_bool'): print(config.get('Section1', 'foo')) To get interpolation, use "ConfigParser": import configparser cfg = configparser.ConfigParser() cfg.read('example.cfg') # Set the optional *raw* argument of get() to True if you wish to disable # interpolation in a single get operation. print(cfg.get('Section1', 'foo', raw=False)) # -> "Python is fun!" print(cfg.get('Section1', 'foo', raw=True)) # -> "%(bar)s is %(baz)s!" # The optional *vars* argument is a dict with members that will take # precedence in interpolation. print(cfg.get('Section1', 'foo', vars={'bar': 'Documentation', 'baz': 'evil'})) # The optional *fallback* argument can be used to provide a fallback value print(cfg.get('Section1', 'foo')) # -> "Python is fun!" print(cfg.get('Section1', 'foo', fallback='Monty is not.')) # -> "Python is fun!" print(cfg.get('Section1', 'monster', fallback='No such things as monsters.')) # -> "No such things as monsters." # A bare print(cfg.get('Section1', 'monster')) would raise NoOptionError # but we can also use: print(cfg.get('Section1', 'monster', fallback=None)) # -> None Default values are available in both types of ConfigParsers. They are used in interpolation if an option used is not defined elsewhere. import configparser # New instance with 'bar' and 'baz' defaulting to 'Life' and 'hard' each config = configparser.ConfigParser({'bar': 'Life', 'baz': 'hard'}) config.read('example.cfg') print(config.get('Section1', 'foo')) # -> "Python is fun!" config.remove_option('Section1', 'bar') config.remove_option('Section1', 'baz') print(config.get('Section1', 'foo')) # -> "Life is hard!" ConfigParser Objects ==================== class configparser.ConfigParser(defaults=None, dict_type=dict, allow_no_value=False, *, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section=configparser.DEFAULTSECT, interpolation=BasicInterpolation(), converters={}, allow_unnamed_section=False) The main configuration parser. When *defaults* is given, it is initialized into the dictionary of intrinsic defaults. When *dict_type* is given, it will be used to create the dictionary objects for the list of sections, for the options within a section, and for the default values. When *delimiters* is given, it is used as the set of substrings that divide keys from values. When *comment_prefixes* is given, it will be used as the set of substrings that prefix comments in otherwise empty lines. Comments can be indented. When *inline_comment_prefixes* is given, it will be used as the set of substrings that prefix comments in non-empty lines. When *strict* is "True" (the default), the parser won’t allow for any section or option duplicates while reading from a single source (file, string or dictionary), raising "DuplicateSectionError" or "DuplicateOptionError". When *empty_lines_in_values* is "False" (default: "True"), each empty line marks the end of an option. Otherwise, internal empty lines of a multiline option are kept as part of the value. When *allow_no_value* is "True" (default: "False"), options without values are accepted; the value held for these is "None" and they are serialized without the trailing delimiter. When *default_section* is given, it specifies the name for the special section holding default values for other sections and interpolation purposes (normally named ""DEFAULT""). This value can be retrieved and changed at runtime using the "default_section" instance attribute. This won’t re-evaluate an already parsed config file, but will be used when writing parsed settings to a new config file. Interpolation behaviour may be customized by providing a custom handler through the *interpolation* argument. "None" can be used to turn off interpolation completely, "ExtendedInterpolation()" provides a more advanced variant inspired by "zc.buildout". More on the subject in the dedicated documentation section. All option names used in interpolation will be passed through the "optionxform()" method just like any other option name reference. For example, using the default implementation of "optionxform()" (which converts option names to lower case), the values "foo %(bar)s" and "foo %(BAR)s" are equivalent. When *converters* is given, it should be a dictionary where each key represents the name of a type converter and each value is a callable implementing the conversion from string to the desired datatype. Every converter gets its own corresponding "get*()" method on the parser object and section proxies. When *allow_unnamed_section* is "True" (default: "False"), the first section name can be omitted. See the “Unnamed Sections” section. It is possible to read several configurations into a single "ConfigParser", where the most recently added configuration has the highest priority. Any conflicting keys are taken from the more recent configuration while the previously existing keys are retained. The example below reads in an "override.ini" file, which will override any conflicting keys from the "example.ini" file. [DEFAULT] ServerAliveInterval = -1 >>> config_override = configparser.ConfigParser() >>> config_override['DEFAULT'] = {'ServerAliveInterval': '-1'} >>> with open('override.ini', 'w') as configfile: ... config_override.write(configfile) ... >>> config_override = configparser.ConfigParser() >>> config_override.read(['example.ini', 'override.ini']) ['example.ini', 'override.ini'] >>> print(config_override.get('DEFAULT', 'ServerAliveInterval')) -1 Changed in version 3.1: The default *dict_type* is "collections.OrderedDict". Changed in version 3.2: *allow_no_value*, *delimiters*, *comment_prefixes*, *strict*, *empty_lines_in_values*, *default_section* and *interpolation* were added. Changed in version 3.5: The *converters* argument was added. Changed in version 3.7: The *defaults* argument is read with "read_dict()", providing consistent behavior across the parser: non-string keys and values are implicitly converted to strings. Changed in version 3.8: The default *dict_type* is "dict", since it now preserves insertion order. Changed in version 3.13: Raise a "MultilineContinuationError" when *allow_no_value* is "True", and a key without a value is continued with an indented line. Changed in version 3.13: The *allow_unnamed_section* argument was added. defaults() Return a dictionary containing the instance-wide defaults. sections() Return a list of the sections available; the *default section* is not included in the list. add_section(section) Add a section named *section* to the instance. If a section by the given name already exists, "DuplicateSectionError" is raised. If the *default section* name is passed, "ValueError" is raised. The name of the section must be a string; if not, "TypeError" is raised. Changed in version 3.2: Non-string section names raise "TypeError". has_section(section) Indicates whether the named *section* is present in the configuration. The *default section* is not acknowledged. options(section) Return a list of options available in the specified *section*. has_option(section, option) If the given *section* exists, and contains the given *option*, return "True"; otherwise return "False". If the specified *section* is "None" or an empty string, DEFAULT is assumed. read(filenames, encoding=None) Attempt to read and parse an iterable of filenames, returning a list of filenames which were successfully parsed. If *filenames* is a string, a "bytes" object or a *path-like object*, it is treated as a single filename. If a file named in *filenames* cannot be opened, that file will be ignored. This is designed so that you can specify an iterable of potential configuration file locations (for example, the current directory, the user’s home directory, and some system-wide directory), and all existing configuration files in the iterable will be read. If none of the named files exist, the "ConfigParser" instance will contain an empty dataset. An application which requires initial values to be loaded from a file should load the required file or files using "read_file()" before calling "read()" for any optional files: import configparser, os config = configparser.ConfigParser() config.read_file(open('defaults.cfg')) config.read(['site.cfg', os.path.expanduser('~/.myapp.cfg')], encoding='cp1250') Changed in version 3.2: Added the *encoding* parameter. Previously, all files were read using the default encoding for "open()". Changed in version 3.6.1: The *filenames* parameter accepts a *path-like object*. Changed in version 3.7: The *filenames* parameter accepts a "bytes" object. read_file(f, source=None) Read and parse configuration data from *f* which must be an iterable yielding Unicode strings (for example files opened in text mode). Optional argument *source* specifies the name of the file being read. If not given and *f* has a "name" attribute, that is used for *source*; the default is "''". Added in version 3.2: Replaces "readfp()". read_string(string, source='') Parse configuration data from a string. Optional argument *source* specifies a context-specific name of the string passed. If not given, "''" is used. This should commonly be a filesystem path or a URL. Added in version 3.2. read_dict(dictionary, source='') Load configuration from any object that provides a dict-like "items()" method. Keys are section names, values are dictionaries with keys and values that should be present in the section. If the used dictionary type preserves order, sections and their keys will be added in order. Values are automatically converted to strings. Optional argument *source* specifies a context-specific name of the dictionary passed. If not given, "" is used. This method can be used to copy state between parsers. Added in version 3.2. get(section, option, *, raw=False, vars=None[, fallback]) Get an *option* value for the named *section*. If *vars* is provided, it must be a dictionary. The *option* is looked up in *vars* (if provided), *section*, and in *DEFAULTSECT* in that order. If the key is not found and *fallback* is provided, it is used as a fallback value. "None" can be provided as a *fallback* value. All the "'%'" interpolations are expanded in the return values, unless the *raw* argument is true. Values for interpolation keys are looked up in the same manner as the option. Changed in version 3.2: Arguments *raw*, *vars* and *fallback* are keyword only to protect users from trying to use the third argument as the *fallback* fallback (especially when using the mapping protocol). getint(section, option, *, raw=False, vars=None[, fallback]) A convenience method which coerces the *option* in the specified *section* to an integer. See "get()" for explanation of *raw*, *vars* and *fallback*. getfloat(section, option, *, raw=False, vars=None[, fallback]) A convenience method which coerces the *option* in the specified *section* to a floating-point number. See "get()" for explanation of *raw*, *vars* and *fallback*. getboolean(section, option, *, raw=False, vars=None[, fallback]) A convenience method which coerces the *option* in the specified *section* to a Boolean value. Note that the accepted values for the option are "'1'", "'yes'", "'true'", and "'on'", which cause this method to return "True", and "'0'", "'no'", "'false'", and "'off'", which cause it to return "False". These string values are checked in a case-insensitive manner. Any other value will cause it to raise "ValueError". See "get()" for explanation of *raw*, *vars* and *fallback*. items(raw=False, vars=None) items(section, raw=False, vars=None) When *section* is not given, return a list of *section_name*, *section_proxy* pairs, including DEFAULTSECT. Otherwise, return a list of *name*, *value* pairs for the options in the given *section*. Optional arguments have the same meaning as for the "get()" method. Changed in version 3.8: Items present in *vars* no longer appear in the result. The previous behaviour mixed actual parser options with variables provided for interpolation. set(section, option, value) If the given section exists, set the given option to the specified value; otherwise raise "NoSectionError". *option* and *value* must be strings; if not, "TypeError" is raised. write(fileobject, space_around_delimiters=True) Write a representation of the configuration to the specified *file object*, which must be opened in text mode (accepting strings). This representation can be parsed by a future "read()" call. If *space_around_delimiters* is true, delimiters between keys and values are surrounded by spaces. Note: Comments in the original configuration file are not preserved when writing the configuration back. What is considered a comment, depends on the given values for *comment_prefix* and *inline_comment_prefix*. remove_option(section, option) Remove the specified *option* from the specified *section*. If the section does not exist, raise "NoSectionError". If the option existed to be removed, return "True"; otherwise return "False". remove_section(section) Remove the specified *section* from the configuration. If the section in fact existed, return "True". Otherwise return "False". optionxform(option) Transforms the option name *option* as found in an input file or as passed in by client code to the form that should be used in the internal structures. The default implementation returns a lower-case version of *option*; subclasses may override this or client code can set an attribute of this name on instances to affect this behavior. You don’t need to subclass the parser to use this method, you can also set it on an instance, to a function that takes a string argument and returns a string. Setting it to "str", for example, would make option names case sensitive: cfgparser = ConfigParser() cfgparser.optionxform = str Note that when reading configuration files, whitespace around the option names is stripped before "optionxform()" is called. configparser.UNNAMED_SECTION A special object representing a section name used to reference the unnamed section (see Unnamed Sections). configparser.MAX_INTERPOLATION_DEPTH The maximum depth for recursive interpolation for "get()" when the *raw* parameter is false. This is relevant only when the default *interpolation* is used. RawConfigParser Objects ======================= class configparser.RawConfigParser(defaults=None, dict_type=dict, allow_no_value=False, *, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section=configparser.DEFAULTSECT, interpolation=BasicInterpolation(), converters={}, allow_unnamed_section=False) Legacy variant of the "ConfigParser". It has interpolation disabled by default and allows for non-string section names, option names, and values via its unsafe "add_section" and "set" methods, as well as the legacy "defaults=" keyword argument handling. Changed in version 3.2: *allow_no_value*, *delimiters*, *comment_prefixes*, *strict*, *empty_lines_in_values*, *default_section* and *interpolation* were added. Changed in version 3.5: The *converters* argument was added. Changed in version 3.8: The default *dict_type* is "dict", since it now preserves insertion order. Changed in version 3.13: The *allow_unnamed_section* argument was added. Note: Consider using "ConfigParser" instead which checks types of the values to be stored internally. If you don’t want interpolation, you can use "ConfigParser(interpolation=None)". add_section(section) Add a section named *section* to the instance. If a section by the given name already exists, "DuplicateSectionError" is raised. If the *default section* name is passed, "ValueError" is raised. Type of *section* is not checked which lets users create non- string named sections. This behaviour is unsupported and may cause internal errors. set(section, option, value) If the given section exists, set the given option to the specified value; otherwise raise "NoSectionError". While it is possible to use "RawConfigParser" (or "ConfigParser" with *raw* parameters set to true) for *internal* storage of non-string values, full functionality (including interpolation and output to files) can only be achieved using string values. This method lets users assign non-string values to keys internally. This behaviour is unsupported and will cause errors when attempting to write to a file or get it in non-raw mode. **Use the mapping protocol API** which does not allow such assignments to take place. Exceptions ========== exception configparser.Error Base class for all other "configparser" exceptions. exception configparser.NoSectionError Exception raised when a specified section is not found. exception configparser.DuplicateSectionError Exception raised if "add_section()" is called with the name of a section that is already present or in strict parsers when a section if found more than once in a single input file, string or dictionary. Changed in version 3.2: Added the optional *source* and *lineno* attributes and parameters to "__init__()". exception configparser.DuplicateOptionError Exception raised by strict parsers if a single option appears twice during reading from a single file, string or dictionary. This catches misspellings and case sensitivity-related errors, e.g. a dictionary may have two keys representing the same case-insensitive configuration key. exception configparser.NoOptionError Exception raised when a specified option is not found in the specified section. exception configparser.InterpolationError Base class for exceptions raised when problems occur performing string interpolation. exception configparser.InterpolationDepthError Exception raised when string interpolation cannot be completed because the number of iterations exceeds "MAX_INTERPOLATION_DEPTH". Subclass of "InterpolationError". exception configparser.InterpolationMissingOptionError Exception raised when an option referenced from a value does not exist. Subclass of "InterpolationError". exception configparser.InterpolationSyntaxError Exception raised when the source text into which substitutions are made does not conform to the required syntax. Subclass of "InterpolationError". exception configparser.MissingSectionHeaderError Exception raised when attempting to parse a file which has no section headers. exception configparser.ParsingError Exception raised when errors occur attempting to parse a file. Changed in version 3.12: The "filename" attribute and "__init__()" constructor argument were removed. They have been available using the name "source" since 3.2. exception configparser.MultilineContinuationError Exception raised when a key without a corresponding value is continued with an indented line. Added in version 3.13. -[ Footnotes ]- [1] Config parsers allow for heavy customization. If you are interested in changing the behaviour outlined by the footnote reference, consult the Customizing Parser Behaviour section. Built-in Constants ****************** A small number of constants live in the built-in namespace. They are: False The false value of the "bool" type. Assignments to "False" are illegal and raise a "SyntaxError". True The true value of the "bool" type. Assignments to "True" are illegal and raise a "SyntaxError". None An object frequently used to represent the absence of a value, as when default arguments are not passed to a function. Assignments to "None" are illegal and raise a "SyntaxError". "None" is the sole instance of the "NoneType" type. NotImplemented A special value which should be returned by the binary special methods (e.g. "__eq__()", "__lt__()", "__add__()", "__rsub__()", etc.) to indicate that the operation is not implemented with respect to the other type; may be returned by the in-place binary special methods (e.g. "__imul__()", "__iand__()", etc.) for the same purpose. It should not be evaluated in a boolean context. "NotImplemented" is the sole instance of the "types.NotImplementedType" type. Note: When a binary (or in-place) method returns "NotImplemented" the interpreter will try the reflected operation on the other type (or some other fallback, depending on the operator). If all attempts return "NotImplemented", the interpreter will raise an appropriate exception. Incorrectly returning "NotImplemented" will result in a misleading error message or the "NotImplemented" value being returned to Python code.See Implementing the arithmetic operations for examples. Caution: "NotImplemented" and "NotImplementedError" are not interchangeable. This constant should only be used as described above; see "NotImplementedError" for details on correct usage of the exception. Changed in version 3.9: Evaluating "NotImplemented" in a boolean context is deprecated. While it currently evaluates as true, it will emit a "DeprecationWarning". It will raise a "TypeError" in a future version of Python. Ellipsis The same as the ellipsis literal “"..."”. Special value used mostly in conjunction with extended slicing syntax for user-defined container data types. "Ellipsis" is the sole instance of the "types.EllipsisType" type. __debug__ This constant is true if Python was not started with an "-O" option. See also the "assert" statement. Note: The names "None", "False", "True" and "__debug__" cannot be reassigned (assignments to them, even as an attribute name, raise "SyntaxError"), so they can be considered “true” constants. Constants added by the "site" module ==================================== The "site" module (which is imported automatically during startup, except if the "-S" command-line option is given) adds several constants to the built-in namespace. They are useful for the interactive interpreter shell and should not be used in programs. quit(code=None) exit(code=None) Objects that when printed, print a message like “Use quit() or Ctrl-D (i.e. EOF) to exit”, and when called, raise "SystemExit" with the specified exit code. help Object that when printed, prints the message “Type help() for interactive help, or help(object) for help about object.”, and when called, acts as described "elsewhere". copyright credits Objects that when printed or called, print the text of copyright or credits, respectively. license Object that when printed, prints the message “Type license() to see the full license text”, and when called, displays the full license text in a pager-like fashion (one screen at a time). "contextlib" — Utilities for "with"-statement contexts ****************************************************** **Source code:** Lib/contextlib.py ====================================================================== This module provides utilities for common tasks involving the "with" statement. For more information see also Context Manager Types and With Statement Context Managers. Utilities ========= Functions and classes provided: class contextlib.AbstractContextManager An *abstract base class* for classes that implement "object.__enter__()" and "object.__exit__()". A default implementation for "object.__enter__()" is provided which returns "self" while "object.__exit__()" is an abstract method which by default returns "None". See also the definition of Context Manager Types. Added in version 3.6. class contextlib.AbstractAsyncContextManager An *abstract base class* for classes that implement "object.__aenter__()" and "object.__aexit__()". A default implementation for "object.__aenter__()" is provided which returns "self" while "object.__aexit__()" is an abstract method which by default returns "None". See also the definition of Asynchronous Context Managers. Added in version 3.7. @contextlib.contextmanager This function is a *decorator* that can be used to define a factory function for "with" statement context managers, without needing to create a class or separate "__enter__()" and "__exit__()" methods. While many objects natively support use in with statements, sometimes a resource needs to be managed that isn’t a context manager in its own right, and doesn’t implement a "close()" method for use with "contextlib.closing". An abstract example would be the following to ensure correct resource management: from contextlib import contextmanager @contextmanager def managed_resource(*args, **kwds): # Code to acquire resource, e.g.: resource = acquire_resource(*args, **kwds) try: yield resource finally: # Code to release resource, e.g.: release_resource(resource) The function can then be used like this: >>> with managed_resource(timeout=3600) as resource: ... # Resource is released at the end of this block, ... # even if code in the block raises an exception The function being decorated must return a *generator*-iterator when called. This iterator must yield exactly one value, which will be bound to the targets in the "with" statement’s "as" clause, if any. At the point where the generator yields, the block nested in the "with" statement is executed. The generator is then resumed after the block is exited. If an unhandled exception occurs in the block, it is reraised inside the generator at the point where the yield occurred. Thus, you can use a "try"…"except"…"finally" statement to trap the error (if any), or ensure that some cleanup takes place. If an exception is trapped merely in order to log it or to perform some action (rather than to suppress it entirely), the generator must reraise that exception. Otherwise the generator context manager will indicate to the "with" statement that the exception has been handled, and execution will resume with the statement immediately following the "with" statement. "contextmanager()" uses "ContextDecorator" so the context managers it creates can be used as decorators as well as in "with" statements. When used as a decorator, a new generator instance is implicitly created on each function call (this allows the otherwise “one-shot” context managers created by "contextmanager()" to meet the requirement that context managers support multiple invocations in order to be used as decorators). Changed in version 3.2: Use of "ContextDecorator". @contextlib.asynccontextmanager Similar to "contextmanager()", but creates an asynchronous context manager. This function is a *decorator* that can be used to define a factory function for "async with" statement asynchronous context managers, without needing to create a class or separate "__aenter__()" and "__aexit__()" methods. It must be applied to an *asynchronous generator* function. A simple example: from contextlib import asynccontextmanager @asynccontextmanager async def get_connection(): conn = await acquire_db_connection() try: yield conn finally: await release_db_connection(conn) async def get_all_users(): async with get_connection() as conn: return conn.query('SELECT ...') Added in version 3.7. Context managers defined with "asynccontextmanager()" can be used either as decorators or with "async with" statements: import time from contextlib import asynccontextmanager @asynccontextmanager async def timeit(): now = time.monotonic() try: yield finally: print(f'it took {time.monotonic() - now}s to run') @timeit() async def main(): # ... async code ... When used as a decorator, a new generator instance is implicitly created on each function call. This allows the otherwise “one-shot” context managers created by "asynccontextmanager()" to meet the requirement that context managers support multiple invocations in order to be used as decorators. Changed in version 3.10: Async context managers created with "asynccontextmanager()" can be used as decorators. contextlib.closing(thing) Return a context manager that closes *thing* upon completion of the block. This is basically equivalent to: from contextlib import contextmanager @contextmanager def closing(thing): try: yield thing finally: thing.close() And lets you write code like this: from contextlib import closing from urllib.request import urlopen with closing(urlopen('https://www.python.org')) as page: for line in page: print(line) without needing to explicitly close "page". Even if an error occurs, "page.close()" will be called when the "with" block is exited. Note: Most types managing resources support the *context manager* protocol, which closes *thing* on leaving the "with" statement. As such, "closing()" is most useful for third party types that don’t support context managers. This example is purely for illustration purposes, as "urlopen()" would normally be used in a context manager. contextlib.aclosing(thing) Return an async context manager that calls the "aclose()" method of *thing* upon completion of the block. This is basically equivalent to: from contextlib import asynccontextmanager @asynccontextmanager async def aclosing(thing): try: yield thing finally: await thing.aclose() Significantly, "aclosing()" supports deterministic cleanup of async generators when they happen to exit early by "break" or an exception. For example: from contextlib import aclosing async with aclosing(my_generator()) as values: async for value in values: if value == 42: break This pattern ensures that the generator’s async exit code is executed in the same context as its iterations (so that exceptions and context variables work as expected, and the exit code isn’t run after the lifetime of some task it depends on). Added in version 3.10. contextlib.nullcontext(enter_result=None) Return a context manager that returns *enter_result* from "__enter__", but otherwise does nothing. It is intended to be used as a stand-in for an optional context manager, for example: def myfunction(arg, ignore_exceptions=False): if ignore_exceptions: # Use suppress to ignore all exceptions. cm = contextlib.suppress(Exception) else: # Do not ignore any exceptions, cm has no effect. cm = contextlib.nullcontext() with cm: # Do something An example using *enter_result*: def process_file(file_or_path): if isinstance(file_or_path, str): # If string, open file cm = open(file_or_path) else: # Caller is responsible for closing file cm = nullcontext(file_or_path) with cm as file: # Perform processing on the file It can also be used as a stand-in for asynchronous context managers: async def send_http(session=None): if not session: # If no http session, create it with aiohttp cm = aiohttp.ClientSession() else: # Caller is responsible for closing the session cm = nullcontext(session) async with cm as session: # Send http requests with session Added in version 3.7. Changed in version 3.10: *asynchronous context manager* support was added. contextlib.suppress(*exceptions) Return a context manager that suppresses any of the specified exceptions if they occur in the body of a "with" statement and then resumes execution with the first statement following the end of the "with" statement. As with any other mechanism that completely suppresses exceptions, this context manager should be used only to cover very specific errors where silently continuing with program execution is known to be the right thing to do. For example: from contextlib import suppress with suppress(FileNotFoundError): os.remove('somefile.tmp') with suppress(FileNotFoundError): os.remove('someotherfile.tmp') This code is equivalent to: try: os.remove('somefile.tmp') except FileNotFoundError: pass try: os.remove('someotherfile.tmp') except FileNotFoundError: pass This context manager is reentrant. If the code within the "with" block raises a "BaseExceptionGroup", suppressed exceptions are removed from the group. Any exceptions of the group which are not suppressed are re-raised in a new group which is created using the original group’s "derive()" method. Added in version 3.4. Changed in version 3.12: "suppress" now supports suppressing exceptions raised as part of a "BaseExceptionGroup". contextlib.redirect_stdout(new_target) Context manager for temporarily redirecting "sys.stdout" to another file or file-like object. This tool adds flexibility to existing functions or classes whose output is hardwired to stdout. For example, the output of "help()" normally is sent to *sys.stdout*. You can capture that output in a string by redirecting the output to an "io.StringIO" object. The replacement stream is returned from the "__enter__" method and so is available as the target of the "with" statement: with redirect_stdout(io.StringIO()) as f: help(pow) s = f.getvalue() To send the output of "help()" to a file on disk, redirect the output to a regular file: with open('help.txt', 'w') as f: with redirect_stdout(f): help(pow) To send the output of "help()" to *sys.stderr*: with redirect_stdout(sys.stderr): help(pow) Note that the global side effect on "sys.stdout" means that this context manager is not suitable for use in library code and most threaded applications. It also has no effect on the output of subprocesses. However, it is still a useful approach for many utility scripts. This context manager is reentrant. Added in version 3.4. contextlib.redirect_stderr(new_target) Similar to "redirect_stdout()" but redirecting "sys.stderr" to another file or file-like object. This context manager is reentrant. Added in version 3.5. contextlib.chdir(path) Non parallel-safe context manager to change the current working directory. As this changes a global state, the working directory, it is not suitable for use in most threaded or async contexts. It is also not suitable for most non-linear code execution, like generators, where the program execution is temporarily relinquished – unless explicitly desired, you should not yield when this context manager is active. This is a simple wrapper around "chdir()", it changes the current working directory upon entering and restores the old one on exit. This context manager is reentrant. Added in version 3.11. class contextlib.ContextDecorator A base class that enables a context manager to also be used as a decorator. Context managers inheriting from "ContextDecorator" have to implement "__enter__" and "__exit__" as normal. "__exit__" retains its optional exception handling even when used as a decorator. "ContextDecorator" is used by "contextmanager()", so you get this functionality automatically. Example of "ContextDecorator": from contextlib import ContextDecorator class mycontext(ContextDecorator): def __enter__(self): print('Starting') return self def __exit__(self, *exc): print('Finishing') return False The class can then be used like this: >>> @mycontext() ... def function(): ... print('The bit in the middle') ... >>> function() Starting The bit in the middle Finishing >>> with mycontext(): ... print('The bit in the middle') ... Starting The bit in the middle Finishing This change is just syntactic sugar for any construct of the following form: def f(): with cm(): # Do stuff "ContextDecorator" lets you instead write: @cm() def f(): # Do stuff It makes it clear that the "cm" applies to the whole function, rather than just a piece of it (and saving an indentation level is nice, too). Existing context managers that already have a base class can be extended by using "ContextDecorator" as a mixin class: from contextlib import ContextDecorator class mycontext(ContextBaseClass, ContextDecorator): def __enter__(self): return self def __exit__(self, *exc): return False Note: As the decorated function must be able to be called multiple times, the underlying context manager must support use in multiple "with" statements. If this is not the case, then the original construct with the explicit "with" statement inside the function should be used. Added in version 3.2. class contextlib.AsyncContextDecorator Similar to "ContextDecorator" but only for asynchronous functions. Example of "AsyncContextDecorator": from asyncio import run from contextlib import AsyncContextDecorator class mycontext(AsyncContextDecorator): async def __aenter__(self): print('Starting') return self async def __aexit__(self, *exc): print('Finishing') return False The class can then be used like this: >>> @mycontext() ... async def function(): ... print('The bit in the middle') ... >>> run(function()) Starting The bit in the middle Finishing >>> async def function(): ... async with mycontext(): ... print('The bit in the middle') ... >>> run(function()) Starting The bit in the middle Finishing Added in version 3.10. class contextlib.ExitStack A context manager that is designed to make it easy to programmatically combine other context managers and cleanup functions, especially those that are optional or otherwise driven by input data. For example, a set of files may easily be handled in a single with statement as follows: with ExitStack() as stack: files = [stack.enter_context(open(fname)) for fname in filenames] # All opened files will automatically be closed at the end of # the with statement, even if attempts to open files later # in the list raise an exception The "__enter__()" method returns the "ExitStack" instance, and performs no additional operations. Each instance maintains a stack of registered callbacks that are called in reverse order when the instance is closed (either explicitly or implicitly at the end of a "with" statement). Note that callbacks are *not* invoked implicitly when the context stack instance is garbage collected. This stack model is used so that context managers that acquire their resources in their "__init__" method (such as file objects) can be handled correctly. Since registered callbacks are invoked in the reverse order of registration, this ends up behaving as if multiple nested "with" statements had been used with the registered set of callbacks. This even extends to exception handling - if an inner callback suppresses or replaces an exception, then outer callbacks will be passed arguments based on that updated state. This is a relatively low level API that takes care of the details of correctly unwinding the stack of exit callbacks. It provides a suitable foundation for higher level context managers that manipulate the exit stack in application specific ways. Added in version 3.3. enter_context(cm) Enters a new context manager and adds its "__exit__()" method to the callback stack. The return value is the result of the context manager’s own "__enter__()" method. These context managers may suppress exceptions just as they normally would if used directly as part of a "with" statement. Changed in version 3.11: Raises "TypeError" instead of "AttributeError" if *cm* is not a context manager. push(exit) Adds a context manager’s "__exit__()" method to the callback stack. As "__enter__" is *not* invoked, this method can be used to cover part of an "__enter__()" implementation with a context manager’s own "__exit__()" method. If passed an object that is not a context manager, this method assumes it is a callback with the same signature as a context manager’s "__exit__()" method and adds it directly to the callback stack. By returning true values, these callbacks can suppress exceptions the same way context manager "__exit__()" methods can. The passed in object is returned from the function, allowing this method to be used as a function decorator. callback(callback, /, *args, **kwds) Accepts an arbitrary callback function and arguments and adds it to the callback stack. Unlike the other methods, callbacks added this way cannot suppress exceptions (as they are never passed the exception details). The passed in callback is returned from the function, allowing this method to be used as a function decorator. pop_all() Transfers the callback stack to a fresh "ExitStack" instance and returns it. No callbacks are invoked by this operation - instead, they will now be invoked when the new stack is closed (either explicitly or implicitly at the end of a "with" statement). For example, a group of files can be opened as an “all or nothing” operation as follows: with ExitStack() as stack: files = [stack.enter_context(open(fname)) for fname in filenames] # Hold onto the close method, but don't call it yet. close_files = stack.pop_all().close # If opening any file fails, all previously opened files will be # closed automatically. If all files are opened successfully, # they will remain open even after the with statement ends. # close_files() can then be invoked explicitly to close them all. close() Immediately unwinds the callback stack, invoking callbacks in the reverse order of registration. For any context managers and exit callbacks registered, the arguments passed in will indicate that no exception occurred. class contextlib.AsyncExitStack An asynchronous context manager, similar to "ExitStack", that supports combining both synchronous and asynchronous context managers, as well as having coroutines for cleanup logic. The "close()" method is not implemented; "aclose()" must be used instead. async enter_async_context(cm) Similar to "ExitStack.enter_context()" but expects an asynchronous context manager. Changed in version 3.11: Raises "TypeError" instead of "AttributeError" if *cm* is not an asynchronous context manager. push_async_exit(exit) Similar to "ExitStack.push()" but expects either an asynchronous context manager or a coroutine function. push_async_callback(callback, /, *args, **kwds) Similar to "ExitStack.callback()" but expects a coroutine function. async aclose() Similar to "ExitStack.close()" but properly handles awaitables. Continuing the example for "asynccontextmanager()": async with AsyncExitStack() as stack: connections = [await stack.enter_async_context(get_connection()) for i in range(5)] # All opened connections will automatically be released at the end of # the async with statement, even if attempts to open a connection # later in the list raise an exception. Added in version 3.7. Examples and Recipes ==================== This section describes some examples and recipes for making effective use of the tools provided by "contextlib". Supporting a variable number of context managers ------------------------------------------------ The primary use case for "ExitStack" is the one given in the class documentation: supporting a variable number of context managers and other cleanup operations in a single "with" statement. The variability may come from the number of context managers needed being driven by user input (such as opening a user specified collection of files), or from some of the context managers being optional: with ExitStack() as stack: for resource in resources: stack.enter_context(resource) if need_special_resource(): special = acquire_special_resource() stack.callback(release_special_resource, special) # Perform operations that use the acquired resources As shown, "ExitStack" also makes it quite easy to use "with" statements to manage arbitrary resources that don’t natively support the context management protocol. Catching exceptions from "__enter__" methods -------------------------------------------- It is occasionally desirable to catch exceptions from an "__enter__" method implementation, *without* inadvertently catching exceptions from the "with" statement body or the context manager’s "__exit__" method. By using "ExitStack" the steps in the context management protocol can be separated slightly in order to allow this: stack = ExitStack() try: x = stack.enter_context(cm) except Exception: # handle __enter__ exception else: with stack: # Handle normal case Actually needing to do this is likely to indicate that the underlying API should be providing a direct resource management interface for use with "try"/"except"/"finally" statements, but not all APIs are well designed in that regard. When a context manager is the only resource management API provided, then "ExitStack" can make it easier to handle various situations that can’t be handled directly in a "with" statement. Cleaning up in an "__enter__" implementation -------------------------------------------- As noted in the documentation of "ExitStack.push()", this method can be useful in cleaning up an already allocated resource if later steps in the "__enter__()" implementation fail. Here’s an example of doing this for a context manager that accepts resource acquisition and release functions, along with an optional validation function, and maps them to the context management protocol: from contextlib import contextmanager, AbstractContextManager, ExitStack class ResourceManager(AbstractContextManager): def __init__(self, acquire_resource, release_resource, check_resource_ok=None): self.acquire_resource = acquire_resource self.release_resource = release_resource if check_resource_ok is None: def check_resource_ok(resource): return True self.check_resource_ok = check_resource_ok @contextmanager def _cleanup_on_error(self): with ExitStack() as stack: stack.push(self) yield # The validation check passed and didn't raise an exception # Accordingly, we want to keep the resource, and pass it # back to our caller stack.pop_all() def __enter__(self): resource = self.acquire_resource() with self._cleanup_on_error(): if not self.check_resource_ok(resource): msg = "Failed validation for {!r}" raise RuntimeError(msg.format(resource)) return resource def __exit__(self, *exc_details): # We don't need to duplicate any of our resource release logic self.release_resource() Replacing any use of "try-finally" and flag variables ----------------------------------------------------- A pattern you will sometimes see is a "try-finally" statement with a flag variable to indicate whether or not the body of the "finally" clause should be executed. In its simplest form (that can’t already be handled just by using an "except" clause instead), it looks something like this: cleanup_needed = True try: result = perform_operation() if result: cleanup_needed = False finally: if cleanup_needed: cleanup_resources() As with any "try" statement based code, this can cause problems for development and review, because the setup code and the cleanup code can end up being separated by arbitrarily long sections of code. "ExitStack" makes it possible to instead register a callback for execution at the end of a "with" statement, and then later decide to skip executing that callback: from contextlib import ExitStack with ExitStack() as stack: stack.callback(cleanup_resources) result = perform_operation() if result: stack.pop_all() This allows the intended cleanup behaviour to be made explicit up front, rather than requiring a separate flag variable. If a particular application uses this pattern a lot, it can be simplified even further by means of a small helper class: from contextlib import ExitStack class Callback(ExitStack): def __init__(self, callback, /, *args, **kwds): super().__init__() self.callback(callback, *args, **kwds) def cancel(self): self.pop_all() with Callback(cleanup_resources) as cb: result = perform_operation() if result: cb.cancel() If the resource cleanup isn’t already neatly bundled into a standalone function, then it is still possible to use the decorator form of "ExitStack.callback()" to declare the resource cleanup in advance: from contextlib import ExitStack with ExitStack() as stack: @stack.callback def cleanup_resources(): ... result = perform_operation() if result: stack.pop_all() Due to the way the decorator protocol works, a callback function declared this way cannot take any parameters. Instead, any resources to be released must be accessed as closure variables. Using a context manager as a function decorator ----------------------------------------------- "ContextDecorator" makes it possible to use a context manager in both an ordinary "with" statement and also as a function decorator. For example, it is sometimes useful to wrap functions or groups of statements with a logger that can track the time of entry and time of exit. Rather than writing both a function decorator and a context manager for the task, inheriting from "ContextDecorator" provides both capabilities in a single definition: from contextlib import ContextDecorator import logging logging.basicConfig(level=logging.INFO) class track_entry_and_exit(ContextDecorator): def __init__(self, name): self.name = name def __enter__(self): logging.info('Entering: %s', self.name) def __exit__(self, exc_type, exc, exc_tb): logging.info('Exiting: %s', self.name) Instances of this class can be used as both a context manager: with track_entry_and_exit('widget loader'): print('Some time consuming activity goes here') load_widget() And also as a function decorator: @track_entry_and_exit('widget loader') def activity(): print('Some time consuming activity goes here') load_widget() Note that there is one additional limitation when using context managers as function decorators: there’s no way to access the return value of "__enter__()". If that value is needed, then it is still necessary to use an explicit "with" statement. See also: **PEP 343** - The “with” statement The specification, background, and examples for the Python "with" statement. Single use, reusable and reentrant context managers =================================================== Most context managers are written in a way that means they can only be used effectively in a "with" statement once. These single use context managers must be created afresh each time they’re used - attempting to use them a second time will trigger an exception or otherwise not work correctly. This common limitation means that it is generally advisable to create context managers directly in the header of the "with" statement where they are used (as shown in all of the usage examples above). Files are an example of effectively single use context managers, since the first "with" statement will close the file, preventing any further IO operations using that file object. Context managers created using "contextmanager()" are also single use context managers, and will complain about the underlying generator failing to yield if an attempt is made to use them a second time: >>> from contextlib import contextmanager >>> @contextmanager ... def singleuse(): ... print("Before") ... yield ... print("After") ... >>> cm = singleuse() >>> with cm: ... pass ... Before After >>> with cm: ... pass ... Traceback (most recent call last): ... RuntimeError: generator didn't yield Reentrant context managers -------------------------- More sophisticated context managers may be “reentrant”. These context managers can not only be used in multiple "with" statements, but may also be used *inside* a "with" statement that is already using the same context manager. "threading.RLock" is an example of a reentrant context manager, as are "suppress()", "redirect_stdout()", and "chdir()". Here’s a very simple example of reentrant use: >>> from contextlib import redirect_stdout >>> from io import StringIO >>> stream = StringIO() >>> write_to_stream = redirect_stdout(stream) >>> with write_to_stream: ... print("This is written to the stream rather than stdout") ... with write_to_stream: ... print("This is also written to the stream") ... >>> print("This is written directly to stdout") This is written directly to stdout >>> print(stream.getvalue()) This is written to the stream rather than stdout This is also written to the stream Real world examples of reentrancy are more likely to involve multiple functions calling each other and hence be far more complicated than this example. Note also that being reentrant is *not* the same thing as being thread safe. "redirect_stdout()", for example, is definitely not thread safe, as it makes a global modification to the system state by binding "sys.stdout" to a different stream. Reusable context managers ------------------------- Distinct from both single use and reentrant context managers are “reusable” context managers (or, to be completely explicit, “reusable, but not reentrant” context managers, since reentrant context managers are also reusable). These context managers support being used multiple times, but will fail (or otherwise not work correctly) if the specific context manager instance has already been used in a containing with statement. "threading.Lock" is an example of a reusable, but not reentrant, context manager (for a reentrant lock, it is necessary to use "threading.RLock" instead). Another example of a reusable, but not reentrant, context manager is "ExitStack", as it invokes *all* currently registered callbacks when leaving any with statement, regardless of where those callbacks were added: >>> from contextlib import ExitStack >>> stack = ExitStack() >>> with stack: ... stack.callback(print, "Callback: from first context") ... print("Leaving first context") ... Leaving first context Callback: from first context >>> with stack: ... stack.callback(print, "Callback: from second context") ... print("Leaving second context") ... Leaving second context Callback: from second context >>> with stack: ... stack.callback(print, "Callback: from outer context") ... with stack: ... stack.callback(print, "Callback: from inner context") ... print("Leaving inner context") ... print("Leaving outer context") ... Leaving inner context Callback: from inner context Callback: from outer context Leaving outer context As the output from the example shows, reusing a single stack object across multiple with statements works correctly, but attempting to nest them will cause the stack to be cleared at the end of the innermost with statement, which is unlikely to be desirable behaviour. Using separate "ExitStack" instances instead of reusing a single instance avoids that problem: >>> from contextlib import ExitStack >>> with ExitStack() as outer_stack: ... outer_stack.callback(print, "Callback: from outer context") ... with ExitStack() as inner_stack: ... inner_stack.callback(print, "Callback: from inner context") ... print("Leaving inner context") ... print("Leaving outer context") ... Leaving inner context Callback: from inner context Leaving outer context Callback: from outer context "contextvars" — Context Variables ********************************* ====================================================================== This module provides APIs to manage, store, and access context-local state. The "ContextVar" class is used to declare and work with *Context Variables*. The "copy_context()" function and the "Context" class should be used to manage the current context in asynchronous frameworks. Context managers that have state should use Context Variables instead of "threading.local()" to prevent their state from bleeding to other code unexpectedly, when used in concurrent code. See also **PEP 567** for additional details. Added in version 3.7. Context Variables ================= class contextvars.ContextVar(name[, *, default]) This class is used to declare a new Context Variable, e.g.: var: ContextVar[int] = ContextVar('var', default=42) The required *name* parameter is used for introspection and debug purposes. The optional keyword-only *default* parameter is returned by "ContextVar.get()" when no value for the variable is found in the current context. **Important:** Context Variables should be created at the top module level and never in closures. "Context" objects hold strong references to context variables which prevents context variables from being properly garbage collected. name The name of the variable. This is a read-only property. Added in version 3.7.1. get([default]) Return a value for the context variable for the current context. If there is no value for the variable in the current context, the method will: * return the value of the *default* argument of the method, if provided; or * return the default value for the context variable, if it was created with one; or * raise a "LookupError". set(value) Call to set a new value for the context variable in the current context. The required *value* argument is the new value for the context variable. Returns a "Token" object that can be used to restore the variable to its previous value via the "ContextVar.reset()" method. reset(token) Reset the context variable to the value it had before the "ContextVar.set()" that created the *token* was used. For example: var = ContextVar('var') token = var.set('new value') # code that uses 'var'; var.get() returns 'new value'. var.reset(token) # After the reset call the var has no value again, so # var.get() would raise a LookupError. class contextvars.Token *Token* objects are returned by the "ContextVar.set()" method. They can be passed to the "ContextVar.reset()" method to revert the value of the variable to what it was before the corresponding *set*. var A read-only property. Points to the "ContextVar" object that created the token. old_value A read-only property. Set to the value the variable had before the "ContextVar.set()" method call that created the token. It points to "Token.MISSING" if the variable was not set before the call. MISSING A marker object used by "Token.old_value". Manual Context Management ========================= contextvars.copy_context() Returns a copy of the current "Context" object. The following snippet gets a copy of the current context and prints all variables and their values that are set in it: ctx: Context = copy_context() print(list(ctx.items())) The function has an *O*(1) complexity, i.e. works equally fast for contexts with a few context variables and for contexts that have a lot of them. class contextvars.Context A mapping of "ContextVars" to their values. "Context()" creates an empty context with no values in it. To get a copy of the current context use the "copy_context()" function. Each thread has its own effective stack of "Context" objects. The *current context* is the "Context" object at the top of the current thread’s stack. All "Context" objects in the stacks are considered to be *entered*. *Entering* a context, which can be done by calling its "run()" method, makes the context the current context by pushing it onto the top of the current thread’s context stack. *Exiting* from the current context, which can be done by returning from the callback passed to the "run()" method, restores the current context to what it was before the context was entered by popping the context off the top of the context stack. Since each thread has its own context stack, "ContextVar" objects behave in a similar fashion to "threading.local()" when values are assigned in different threads. Attempting to enter an already entered context, including contexts entered in other threads, raises a "RuntimeError". After exiting a context, it can later be re-entered (from any thread). Any changes to "ContextVar" values via the "ContextVar.set()" method are recorded in the current context. The "ContextVar.get()" method returns the value associated with the current context. Exiting a context effectively reverts any changes made to context variables while the context was entered (if needed, the values can be restored by re-entering the context). Context implements the "collections.abc.Mapping" interface. run(callable, *args, **kwargs) Enters the Context, executes "callable(*args, **kwargs)", then exits the Context. Returns *callable*’s return value, or propagates an exception if one occurred. Example: import contextvars var = contextvars.ContextVar('var') var.set('spam') print(var.get()) # 'spam' ctx = contextvars.copy_context() def main(): # 'var' was set to 'spam' before # calling 'copy_context()' and 'ctx.run(main)', so: print(var.get()) # 'spam' print(ctx[var]) # 'spam' var.set('ham') # Now, after setting 'var' to 'ham': print(var.get()) # 'ham' print(ctx[var]) # 'ham' # Any changes that the 'main' function makes to 'var' # will be contained in 'ctx'. ctx.run(main) # The 'main()' function was run in the 'ctx' context, # so changes to 'var' are contained in it: print(ctx[var]) # 'ham' # However, outside of 'ctx', 'var' is still set to 'spam': print(var.get()) # 'spam' copy() Return a shallow copy of the context object. var in context Return "True" if the *context* has a value for *var* set; return "False" otherwise. context[var] Return the value of the *var* "ContextVar" variable. If the variable is not set in the context object, a "KeyError" is raised. get(var[, default]) Return the value for *var* if *var* has the value in the context object. Return *default* otherwise. If *default* is not given, return "None". iter(context) Return an iterator over the variables stored in the context object. len(proxy) Return the number of variables set in the context object. keys() Return a list of all variables in the context object. values() Return a list of all variables’ values in the context object. items() Return a list of 2-tuples containing all variables and their values in the context object. asyncio support =============== Context variables are natively supported in "asyncio" and are ready to be used without any extra configuration. For example, here is a simple echo server, that uses a context variable to make the address of a remote client available in the Task that handles that client: import asyncio import contextvars client_addr_var = contextvars.ContextVar('client_addr') def render_goodbye(): # The address of the currently handled client can be accessed # without passing it explicitly to this function. client_addr = client_addr_var.get() return f'Good bye, client @ {client_addr}\r\n'.encode() async def handle_request(reader, writer): addr = writer.transport.get_extra_info('socket').getpeername() client_addr_var.set(addr) # In any code that we call is now possible to get # client's address by calling 'client_addr_var.get()'. while True: line = await reader.readline() print(line) if not line.strip(): break writer.write(b'HTTP/1.1 200 OK\r\n') # status line writer.write(b'\r\n') # headers writer.write(render_goodbye()) # body writer.close() async def main(): srv = await asyncio.start_server( handle_request, '127.0.0.1', 8081) async with srv: await srv.serve_forever() asyncio.run(main()) # To test it you can use telnet or curl: # telnet 127.0.0.1 8081 # curl 127.0.0.1:8081 "copy" — Shallow and deep copy operations ***************************************** **Source code:** Lib/copy.py ====================================================================== Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other. This module provides generic shallow and deep copy operations (explained below). Interface summary: copy.copy(obj) Return a shallow copy of *obj*. copy.deepcopy(obj[, memo]) Return a deep copy of *obj*. copy.replace(obj, /, **changes) Creates a new object of the same type as *obj*, replacing fields with values from *changes*. Added in version 3.13. exception copy.Error Raised for module specific errors. The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances): * A *shallow copy* constructs a new compound object and then (to the extent possible) inserts *references* into it to the objects found in the original. * A *deep copy* constructs a new compound object and then, recursively, inserts *copies* into it of the objects found in the original. Two problems often exist with deep copy operations that don’t exist with shallow copy operations: * Recursive objects (compound objects that, directly or indirectly, contain a reference to themselves) may cause a recursive loop. * Because deep copy copies everything it may copy too much, such as data which is intended to be shared between copies. The "deepcopy()" function avoids these problems by: * keeping a "memo" dictionary of objects already copied during the current copying pass; and * letting user-defined classes override the copying operation or the set of components copied. This module does not copy types like module, method, stack trace, stack frame, file, socket, window, or any similar types. It does “copy” functions and classes (shallow and deeply), by returning the original object unchanged; this is compatible with the way these are treated by the "pickle" module. Shallow copies of dictionaries can be made using "dict.copy()", and of lists by assigning a slice of the entire list, for example, "copied_list = original_list[:]". Classes can use the same interfaces to control copying that they use to control pickling. See the description of module "pickle" for information on these methods. In fact, the "copy" module uses the registered pickle functions from the "copyreg" module. In order for a class to define its own copy implementation, it can define special methods "__copy__()" and "__deepcopy__()". object.__copy__(self) Called to implement the shallow copy operation; no additional arguments are passed. object.__deepcopy__(self, memo) Called to implement the deep copy operation; it is passed one argument, the *memo* dictionary. If the "__deepcopy__" implementation needs to make a deep copy of a component, it should call the "deepcopy()" function with the component as first argument and the *memo* dictionary as second argument. The *memo* dictionary should be treated as an opaque object. Function "copy.replace()" is more limited than "copy()" and "deepcopy()", and only supports named tuples created by "namedtuple()", "dataclasses", and other classes which define method "__replace__()". object.__replace__(self, /, **changes) This method should create a new object of the same type, replacing fields with values from *changes*. Added in version 3.13. See also: Module "pickle" Discussion of the special methods used to support object state retrieval and restoration. "copyreg" — Register "pickle" support functions *********************************************** **Source code:** Lib/copyreg.py ====================================================================== The "copyreg" module offers a way to define functions used while pickling specific objects. The "pickle" and "copy" modules use those functions when pickling/copying those objects. The module provides configuration information about object constructors which are not classes. Such constructors may be factory functions or class instances. copyreg.constructor(object) Declares *object* to be a valid constructor. If *object* is not callable (and hence not valid as a constructor), raises "TypeError". copyreg.pickle(type, function, constructor_ob=None) Declares that *function* should be used as a “reduction” function for objects of type *type*. *function* must return either a string or a tuple containing between two and six elements. See the "dispatch_table" for more details on the interface of *function*. The *constructor_ob* parameter is a legacy feature and is now ignored, but if passed it must be a callable. Note that the "dispatch_table" attribute of a pickler object or subclass of "pickle.Pickler" can also be used for declaring reduction functions. Example ======= The example below would like to show how to register a pickle function and how it will be used: >>> import copyreg, copy, pickle >>> class C: ... def __init__(self, a): ... self.a = a ... >>> def pickle_c(c): ... print("pickling a C instance...") ... return C, (c.a,) ... >>> copyreg.pickle(C, pickle_c) >>> c = C(1) >>> d = copy.copy(c) pickling a C instance... >>> p = pickle.dumps(c) pickling a C instance... "crypt" — Function to check Unix passwords ****************************************** Deprecated since version 3.11, removed in version 3.13. This module is no longer part of the Python standard library. It was removed in Python 3.13 after being deprecated in Python 3.11. The removal was decided in **PEP 594**. Applications can use the "hashlib" module from the standard library. Other possible replacements are third-party libraries from PyPI: legacycrypt, bcrypt, argon2-cffi, or passlib. These are not supported or maintained by the Python core team. The last version of Python that provided the "crypt" module was Python 3.12. Cryptographic Services ********************** The modules described in this chapter implement various algorithms of a cryptographic nature. They are available at the discretion of the installation. Here’s an overview: * "hashlib" — Secure hashes and message digests * Hash algorithms * Usage * Constructors * Attributes * Hash Objects * SHAKE variable length digests * File hashing * Key derivation * BLAKE2 * Creating hash objects * Constants * Examples * Simple hashing * Using different digest sizes * Keyed hashing * Randomized hashing * Personalization * Tree mode * Credits * "hmac" — Keyed-Hashing for Message Authentication * "secrets" — Generate secure random numbers for managing secrets * Random numbers * Generating tokens * How many bytes should tokens use? * Other functions * Recipes and best practices "csv" — CSV File Reading and Writing ************************************ **Source code:** Lib/csv.py ====================================================================== The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in **RFC 4180**. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer. The "csv" module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats. The "csv" module’s "reader" and "writer" objects read and write sequences. Programmers can also read and write data in dictionary form using the "DictReader" and "DictWriter" classes. See also: **PEP 305** - CSV File API The Python Enhancement Proposal which proposed this addition to Python. Module Contents =============== The "csv" module defines the following functions: csv.reader(csvfile, dialect='excel', **fmtparams) Return a reader object that will process lines from the given *csvfile*. A csvfile must be an iterable of strings, each in the reader’s defined csv format. A csvfile is most commonly a file-like object or list. If *csvfile* is a file object, it should be opened with "newline=''". [1] An optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the "Dialect" class or one of the strings returned by the "list_dialects()" function. The other optional *fmtparams* keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about the dialect and formatting parameters, see section Dialects and Formatting Parameters. Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed unless the "QUOTE_NONNUMERIC" format option is specified (in which case unquoted fields are transformed into floats). A short usage example: >>> import csv >>> with open('eggs.csv', newline='') as csvfile: ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') ... for row in spamreader: ... print(', '.join(row)) Spam, Spam, Spam, Spam, Spam, Baked Beans Spam, Lovely Spam, Wonderful Spam csv.writer(csvfile, dialect='excel', **fmtparams) Return a writer object responsible for converting the user’s data into delimited strings on the given file-like object. *csvfile* can be any object with a "write()" method. If *csvfile* is a file object, it should be opened with "newline=''" [1]. An optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the "Dialect" class or one of the strings returned by the "list_dialects()" function. The other optional *fmtparams* keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about dialects and formatting parameters, see the Dialects and Formatting Parameters section. To make it as easy as possible to interface with modules which implement the DB API, the value "None" is written as the empty string. While this isn’t a reversible transformation, it makes it easier to dump SQL NULL data values to CSV files without preprocessing the data returned from a "cursor.fetch*" call. All other non-string data are stringified with "str()" before being written. A short usage example: import csv with open('eggs.csv', 'w', newline='') as csvfile: spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL) spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam']) csv.register_dialect(name[, dialect[, **fmtparams]]) Associate *dialect* with *name*. *name* must be a string. The dialect can be specified either by passing a sub-class of "Dialect", or by *fmtparams* keyword arguments, or both, with keyword arguments overriding parameters of the dialect. For full details about dialects and formatting parameters, see section Dialects and Formatting Parameters. csv.unregister_dialect(name) Delete the dialect associated with *name* from the dialect registry. An "Error" is raised if *name* is not a registered dialect name. csv.get_dialect(name) Return the dialect associated with *name*. An "Error" is raised if *name* is not a registered dialect name. This function returns an immutable "Dialect". csv.list_dialects() Return the names of all registered dialects. csv.field_size_limit([new_limit]) Returns the current maximum field size allowed by the parser. If *new_limit* is given, this becomes the new limit. The "csv" module defines the following classes: class csv.DictReader(f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) Create an object that operates like a regular reader but maps the information in each row to a "dict" whose keys are given by the optional *fieldnames* parameter. The *fieldnames* parameter is a *sequence*. If *fieldnames* is omitted, the values in the first row of file *f* will be used as the fieldnames and will be omitted from the results. If *fieldnames* is provided, they will be used and the first row will be included in the results. Regardless of how the fieldnames are determined, the dictionary preserves their original ordering. If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by *restkey* (which defaults to "None"). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with the value of *restval* (which defaults to "None"). All other optional or keyword arguments are passed to the underlying "reader" instance. If the argument passed to *fieldnames* is an iterator, it will be coerced to a "list". Changed in version 3.6: Returned rows are now of type "OrderedDict". Changed in version 3.8: Returned rows are now of type "dict". A short usage example: >>> import csv >>> with open('names.csv', newline='') as csvfile: ... reader = csv.DictReader(csvfile) ... for row in reader: ... print(row['first_name'], row['last_name']) ... Eric Idle John Cleese >>> print(row) {'first_name': 'John', 'last_name': 'Cleese'} class csv.DictWriter(f, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds) Create an object which operates like a regular writer but maps dictionaries onto output rows. The *fieldnames* parameter is a "sequence" of keys that identify the order in which values in the dictionary passed to the "writerow()" method are written to file *f*. The optional *restval* parameter specifies the value to be written if the dictionary is missing a key in *fieldnames*. If the dictionary passed to the "writerow()" method contains a key not found in *fieldnames*, the optional *extrasaction* parameter indicates what action to take. If it is set to "'raise'", the default value, a "ValueError" is raised. If it is set to "'ignore'", extra values in the dictionary are ignored. Any other optional or keyword arguments are passed to the underlying "writer" instance. Note that unlike the "DictReader" class, the *fieldnames* parameter of the "DictWriter" class is not optional. If the argument passed to *fieldnames* is an iterator, it will be coerced to a "list". A short usage example: import csv with open('names.csv', 'w', newline='') as csvfile: fieldnames = ['first_name', 'last_name'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'}) writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'}) writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'}) class csv.Dialect The "Dialect" class is a container class whose attributes contain information for how to handle doublequotes, whitespace, delimiters, etc. Due to the lack of a strict CSV specification, different applications produce subtly different CSV data. "Dialect" instances define how "reader" and "writer" instances behave. All available "Dialect" names are returned by "list_dialects()", and they can be registered with specific "reader" and "writer" classes through their initializer ("__init__") functions like this: import csv with open('students.csv', 'w', newline='') as csvfile: writer = csv.writer(csvfile, dialect='unix') class csv.excel The "excel" class defines the usual properties of an Excel- generated CSV file. It is registered with the dialect name "'excel'". class csv.excel_tab The "excel_tab" class defines the usual properties of an Excel- generated TAB-delimited file. It is registered with the dialect name "'excel-tab'". class csv.unix_dialect The "unix_dialect" class defines the usual properties of a CSV file generated on UNIX systems, i.e. using "'\n'" as line terminator and quoting all fields. It is registered with the dialect name "'unix'". Added in version 3.2. class csv.Sniffer The "Sniffer" class is used to deduce the format of a CSV file. The "Sniffer" class provides two methods: sniff(sample, delimiters=None) Analyze the given *sample* and return a "Dialect" subclass reflecting the parameters found. If the optional *delimiters* parameter is given, it is interpreted as a string containing possible valid delimiter characters. has_header(sample) Analyze the sample text (presumed to be in CSV format) and return "True" if the first row appears to be a series of column headers. Inspecting each column, one of two key criteria will be considered to estimate if the sample contains a header: * the second through n-th rows contain numeric values * the second through n-th rows contain strings where at least one value’s length differs from that of the putative header of that column. Twenty rows after the first row are sampled; if more than half of columns + rows meet the criteria, "True" is returned. Note: This method is a rough heuristic and may produce both false positives and negatives. An example for "Sniffer" use: with open('example.csv', newline='') as csvfile: dialect = csv.Sniffer().sniff(csvfile.read(1024)) csvfile.seek(0) reader = csv.reader(csvfile, dialect) # ... process CSV file contents here ... The "csv" module defines the following constants: csv.QUOTE_ALL Instructs "writer" objects to quote all fields. csv.QUOTE_MINIMAL Instructs "writer" objects to only quote those fields which contain special characters such as *delimiter*, *quotechar* or any of the characters in *lineterminator*. csv.QUOTE_NONNUMERIC Instructs "writer" objects to quote all non-numeric fields. Instructs "reader" objects to convert all non-quoted fields to type "float". Note: Some numeric types, such as "bool", "Fraction", or "IntEnum", have a string representation that cannot be converted to "float". They cannot be read in the "QUOTE_NONNUMERIC" and "QUOTE_STRINGS" modes. csv.QUOTE_NONE Instructs "writer" objects to never quote fields. When the current *delimiter* occurs in output data it is preceded by the current *escapechar* character. If *escapechar* is not set, the writer will raise "Error" if any characters that require escaping are encountered. Instructs "reader" objects to perform no special processing of quote characters. csv.QUOTE_NOTNULL Instructs "writer" objects to quote all fields which are not "None". This is similar to "QUOTE_ALL", except that if a field value is "None" an empty (unquoted) string is written. Instructs "reader" objects to interpret an empty (unquoted) field as "None" and to otherwise behave as "QUOTE_ALL". Added in version 3.12. csv.QUOTE_STRINGS Instructs "writer" objects to always place quotes around fields which are strings. This is similar to "QUOTE_NONNUMERIC", except that if a field value is "None" an empty (unquoted) string is written. Instructs "reader" objects to interpret an empty (unquoted) string as "None" and to otherwise behave as "QUOTE_NONNUMERIC". Added in version 3.12. The "csv" module defines the following exception: exception csv.Error Raised by any of the functions when an error is detected. Dialects and Formatting Parameters ================================== To make it easier to specify the format of input and output records, specific formatting parameters are grouped together into dialects. A dialect is a subclass of the "Dialect" class containing various attributes describing the format of the CSV file. When creating "reader" or "writer" objects, the programmer can specify a string or a subclass of the "Dialect" class as the dialect parameter. In addition to, or instead of, the *dialect* parameter, the programmer can also specify individual formatting parameters, which have the same names as the attributes defined below for the "Dialect" class. Dialects support the following attributes: Dialect.delimiter A one-character string used to separate fields. It defaults to "','". Dialect.doublequote Controls how instances of *quotechar* appearing inside a field should themselves be quoted. When "True", the character is doubled. When "False", the *escapechar* is used as a prefix to the *quotechar*. It defaults to "True". On output, if *doublequote* is "False" and no *escapechar* is set, "Error" is raised if a *quotechar* is found in a field. Dialect.escapechar A one-character string used by the writer to escape the *delimiter* if *quoting* is set to "QUOTE_NONE" and the *quotechar* if *doublequote* is "False". On reading, the *escapechar* removes any special meaning from the following character. It defaults to "None", which disables escaping. Changed in version 3.11: An empty *escapechar* is not allowed. Dialect.lineterminator The string used to terminate lines produced by the "writer". It defaults to "'\r\n'". Note: The "reader" is hard-coded to recognise either "'\r'" or "'\n'" as end-of-line, and ignores *lineterminator*. This behavior may change in the future. Dialect.quotechar A one-character string used to quote fields containing special characters, such as the *delimiter* or *quotechar*, or which contain new-line characters. It defaults to "'"'". Changed in version 3.11: An empty *quotechar* is not allowed. Dialect.quoting Controls when quotes should be generated by the writer and recognised by the reader. It can take on any of the QUOTE_* constants and defaults to "QUOTE_MINIMAL". Dialect.skipinitialspace When "True", spaces immediately following the *delimiter* are ignored. The default is "False". Dialect.strict When "True", raise exception "Error" on bad CSV input. The default is "False". Reader Objects ============== Reader objects ("DictReader" instances and objects returned by the "reader()" function) have the following public methods: csvreader.__next__() Return the next row of the reader’s iterable object as a list (if the object was returned from "reader()") or a dict (if it is a "DictReader" instance), parsed according to the current "Dialect". Usually you should call this as "next(reader)". Reader objects have the following public attributes: csvreader.dialect A read-only description of the dialect in use by the parser. csvreader.line_num The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines. DictReader objects have the following public attribute: DictReader.fieldnames If not passed as a parameter when creating the object, this attribute is initialized upon first access or when the first record is read from the file. Writer Objects ============== "writer" objects ("DictWriter" instances and objects returned by the "writer()" function) have the following public methods. A *row* must be an iterable of strings or numbers for "writer" objects and a dictionary mapping fieldnames to strings or numbers (by passing them through "str()" first) for "DictWriter" objects. Note that complex numbers are written out surrounded by parens. This may cause some problems for other programs which read CSV files (assuming they support complex numbers at all). csvwriter.writerow(row) Write the *row* parameter to the writer’s file object, formatted according to the current "Dialect". Return the return value of the call to the *write* method of the underlying file object. Changed in version 3.5: Added support of arbitrary iterables. csvwriter.writerows(rows) Write all elements in *rows* (an iterable of *row* objects as described above) to the writer’s file object, formatted according to the current dialect. Writer objects have the following public attribute: csvwriter.dialect A read-only description of the dialect in use by the writer. DictWriter objects have the following public method: DictWriter.writeheader() Write a row with the field names (as specified in the constructor) to the writer’s file object, formatted according to the current dialect. Return the return value of the "csvwriter.writerow()" call used internally. Added in version 3.2. Changed in version 3.8: "writeheader()" now also returns the value returned by the "csvwriter.writerow()" method it uses internally. Examples ======== The simplest example of reading a CSV file: import csv with open('some.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row) Reading a file with an alternate format: import csv with open('passwd', newline='') as f: reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE) for row in reader: print(row) The corresponding simplest possible writing example is: import csv with open('some.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerows(someiterable) Since "open()" is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see "locale.getencoding()"). To decode a file using a different encoding, use the "encoding" argument of open: import csv with open('some.csv', newline='', encoding='utf-8') as f: reader = csv.reader(f) for row in reader: print(row) The same applies to writing in something other than the system default encoding: specify the encoding argument when opening the output file. Registering a new dialect: import csv csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) with open('passwd', newline='') as f: reader = csv.reader(f, 'unixpwd') A slightly more advanced use of the reader — catching and reporting errors: import csv, sys filename = 'some.csv' with open(filename, newline='') as f: reader = csv.reader(f) try: for row in reader: print(row) except csv.Error as e: sys.exit(f'file {filename}, line {reader.line_num}: {e}') And while the module doesn’t directly support parsing strings, it can easily be done: import csv for row in csv.reader(['one,two,three']): print(row) -[ Footnotes ]- [1] If "newline=''" is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use "\r\n" linendings on write an extra "\r" will be added. It should always be safe to specify "newline=''", since the csv module does its own (*universal*) newline handling. "ctypes" — A foreign function library for Python ************************************************ **Source code:** Lib/ctypes ====================================================================== "ctypes" is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python. ctypes tutorial =============== Note: The code samples in this tutorial use "doctest" to make sure that they actually work. Since some code samples behave differently under Linux, Windows, or macOS, they contain doctest directives in comments. Note: Some code samples reference the ctypes "c_int" type. On platforms where "sizeof(long) == sizeof(int)" it is an alias to "c_long". So, you should not be confused if "c_long" is printed if you would expect "c_int" — they are actually the same type. Loading dynamic link libraries ------------------------------ "ctypes" exports the *cdll*, and on Windows *windll* and *oledll* objects, for loading dynamic link libraries. You load libraries by accessing them as attributes of these objects. *cdll* loads libraries which export functions using the standard "cdecl" calling convention, while *windll* libraries call functions using the "stdcall" calling convention. *oledll* also uses the "stdcall" calling convention, and assumes the functions return a Windows "HRESULT" error code. The error code is used to automatically raise an "OSError" exception when the function call fails. Changed in version 3.3: Windows errors used to raise "WindowsError", which is now an alias of "OSError". Here are some examples for Windows. Note that "msvcrt" is the MS standard C library containing most standard C functions, and uses the "cdecl" calling convention: >>> from ctypes import * >>> print(windll.kernel32) >>> print(cdll.msvcrt) >>> libc = cdll.msvcrt >>> Windows appends the usual ".dll" file suffix automatically. Note: Accessing the standard C library through "cdll.msvcrt" will use an outdated version of the library that may be incompatible with the one being used by Python. Where possible, use native Python functionality, or else import and use the "msvcrt" module. On Linux, it is required to specify the filename *including* the extension to load a library, so attribute access can not be used to load libraries. Either the "LoadLibrary()" method of the dll loaders should be used, or you should load the library by creating an instance of CDLL by calling the constructor: >>> cdll.LoadLibrary("libc.so.6") >>> libc = CDLL("libc.so.6") >>> libc >>> Accessing functions from loaded dlls ------------------------------------ Functions are accessed as attributes of dll objects: >>> libc.printf <_FuncPtr object at 0x...> >>> print(windll.kernel32.GetModuleHandleA) <_FuncPtr object at 0x...> >>> print(windll.kernel32.MyOwnFunction) Traceback (most recent call last): File "", line 1, in File "ctypes.py", line 239, in __getattr__ func = _StdcallFuncPtr(name, self) AttributeError: function 'MyOwnFunction' not found >>> Note that win32 system dlls like "kernel32" and "user32" often export ANSI as well as UNICODE versions of a function. The UNICODE version is exported with a "W" appended to the name, while the ANSI version is exported with an "A" appended to the name. The win32 "GetModuleHandle" function, which returns a *module handle* for a given module name, has the following C prototype, and a macro is used to expose one of them as "GetModuleHandle" depending on whether UNICODE is defined or not: /* ANSI version */ HMODULE GetModuleHandleA(LPCSTR lpModuleName); /* UNICODE version */ HMODULE GetModuleHandleW(LPCWSTR lpModuleName); *windll* does not try to select one of them by magic, you must access the version you need by specifying "GetModuleHandleA" or "GetModuleHandleW" explicitly, and then call it with bytes or string objects respectively. Sometimes, dlls export functions with names which aren’t valid Python identifiers, like ""??2@YAPAXI@Z"". In this case you have to use "getattr()" to retrieve the function: >>> getattr(cdll.msvcrt, "??2@YAPAXI@Z") <_FuncPtr object at 0x...> >>> On Windows, some dlls export functions not by name but by ordinal. These functions can be accessed by indexing the dll object with the ordinal number: >>> cdll.kernel32[1] <_FuncPtr object at 0x...> >>> cdll.kernel32[0] Traceback (most recent call last): File "", line 1, in File "ctypes.py", line 310, in __getitem__ func = _StdcallFuncPtr(name, self) AttributeError: function ordinal 0 not found >>> Calling functions ----------------- You can call these functions like any other Python callable. This example uses the "rand()" function, which takes no arguments and returns a pseudo-random integer: >>> print(libc.rand()) 1804289383 On Windows, you can call the "GetModuleHandleA()" function, which returns a win32 module handle (passing "None" as single argument to call it with a "NULL" pointer): >>> print(hex(windll.kernel32.GetModuleHandleA(None))) 0x1d000000 >>> "ValueError" is raised when you call an "stdcall" function with the "cdecl" calling convention, or vice versa: >>> cdll.kernel32.GetModuleHandleA(None) Traceback (most recent call last): File "", line 1, in ValueError: Procedure probably called with not enough arguments (4 bytes missing) >>> >>> windll.msvcrt.printf(b"spam") Traceback (most recent call last): File "", line 1, in ValueError: Procedure probably called with too many arguments (4 bytes in excess) >>> To find out the correct calling convention you have to look into the C header file or the documentation for the function you want to call. On Windows, "ctypes" uses win32 structured exception handling to prevent crashes from general protection faults when functions are called with invalid argument values: >>> windll.kernel32.GetModuleHandleA(32) Traceback (most recent call last): File "", line 1, in OSError: exception: access violation reading 0x00000020 >>> There are, however, enough ways to crash Python with "ctypes", so you should be careful anyway. The "faulthandler" module can be helpful in debugging crashes (e.g. from segmentation faults produced by erroneous C library calls). "None", integers, bytes objects and (unicode) strings are the only native Python objects that can directly be used as parameters in these function calls. "None" is passed as a C "NULL" pointer, bytes objects and strings are passed as pointer to the memory block that contains their data (char* or wchar_t*). Python integers are passed as the platform’s default C int type, their value is masked to fit into the C type. Before we move on calling functions with other parameter types, we have to learn more about "ctypes" data types. Fundamental data types ---------------------- "ctypes" defines a number of primitive C compatible data types: +------------------------+--------------------------------------------+------------------------------+ | ctypes type | C type | Python type | |========================|============================================|==============================| | "c_bool" | _Bool | bool (1) | +------------------------+--------------------------------------------+------------------------------+ | "c_char" | char | 1-character bytes object | +------------------------+--------------------------------------------+------------------------------+ | "c_wchar" | "wchar_t" | 1-character string | +------------------------+--------------------------------------------+------------------------------+ | "c_byte" | char | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ubyte" | unsigned char | int | +------------------------+--------------------------------------------+------------------------------+ | "c_short" | short | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ushort" | unsigned short | int | +------------------------+--------------------------------------------+------------------------------+ | "c_int" | int | int | +------------------------+--------------------------------------------+------------------------------+ | "c_uint" | unsigned int | int | +------------------------+--------------------------------------------+------------------------------+ | "c_long" | long | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ulong" | unsigned long | int | +------------------------+--------------------------------------------+------------------------------+ | "c_longlong" | __int64 or long long | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ulonglong" | unsigned __int64 or unsigned long long | int | +------------------------+--------------------------------------------+------------------------------+ | "c_size_t" | "size_t" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ssize_t" | "ssize_t" or Py_ssize_t | int | +------------------------+--------------------------------------------+------------------------------+ | "c_time_t" | "time_t" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_float" | float | float | +------------------------+--------------------------------------------+------------------------------+ | "c_double" | double | float | +------------------------+--------------------------------------------+------------------------------+ | "c_longdouble" | long double | float | +------------------------+--------------------------------------------+------------------------------+ | "c_char_p" | char* (NUL terminated) | bytes object or "None" | +------------------------+--------------------------------------------+------------------------------+ | "c_wchar_p" | wchar_t* (NUL terminated) | string or "None" | +------------------------+--------------------------------------------+------------------------------+ | "c_void_p" | void* | int or "None" | +------------------------+--------------------------------------------+------------------------------+ 1. The constructor accepts any object with a truth value. All these types can be created by calling them with an optional initializer of the correct type and value: >>> c_int() c_long(0) >>> c_wchar_p("Hello, World") c_wchar_p(140018365411392) >>> c_ushort(-3) c_ushort(65533) >>> Since these types are mutable, their value can also be changed afterwards: >>> i = c_int(42) >>> print(i) c_long(42) >>> print(i.value) 42 >>> i.value = -99 >>> print(i.value) -99 >>> Assigning a new value to instances of the pointer types "c_char_p", "c_wchar_p", and "c_void_p" changes the *memory location* they point to, *not the contents* of the memory block (of course not, because Python string objects are immutable): >>> s = "Hello, World" >>> c_s = c_wchar_p(s) >>> print(c_s) c_wchar_p(139966785747344) >>> print(c_s.value) Hello World >>> c_s.value = "Hi, there" >>> print(c_s) # the memory location has changed c_wchar_p(139966783348904) >>> print(c_s.value) Hi, there >>> print(s) # first object is unchanged Hello, World >>> You should be careful, however, not to pass them to functions expecting pointers to mutable memory. If you need mutable memory blocks, ctypes has a "create_string_buffer()" function which creates these in various ways. The current memory block contents can be accessed (or changed) with the "raw" property; if you want to access it as NUL terminated string, use the "value" property: >>> from ctypes import * >>> p = create_string_buffer(3) # create a 3 byte buffer, initialized to NUL bytes >>> print(sizeof(p), repr(p.raw)) 3 b'\x00\x00\x00' >>> p = create_string_buffer(b"Hello") # create a buffer containing a NUL terminated string >>> print(sizeof(p), repr(p.raw)) 6 b'Hello\x00' >>> print(repr(p.value)) b'Hello' >>> p = create_string_buffer(b"Hello", 10) # create a 10 byte buffer >>> print(sizeof(p), repr(p.raw)) 10 b'Hello\x00\x00\x00\x00\x00' >>> p.value = b"Hi" >>> print(sizeof(p), repr(p.raw)) 10 b'Hi\x00lo\x00\x00\x00\x00\x00' >>> The "create_string_buffer()" function replaces the old "c_buffer()" function (which is still available as an alias). To create a mutable memory block containing unicode characters of the C type "wchar_t", use the "create_unicode_buffer()" function. Calling functions, continued ---------------------------- Note that printf prints to the real standard output channel, *not* to "sys.stdout", so these examples will only work at the console prompt, not from within *IDLE* or *PythonWin*: >>> printf = libc.printf >>> printf(b"Hello, %s\n", b"World!") Hello, World! 14 >>> printf(b"Hello, %S\n", "World!") Hello, World! 14 >>> printf(b"%d bottles of beer\n", 42) 42 bottles of beer 19 >>> printf(b"%f bottles of beer\n", 42.5) Traceback (most recent call last): File "", line 1, in ctypes.ArgumentError: argument 2: TypeError: Don't know how to convert parameter 2 >>> As has been mentioned before, all Python types except integers, strings, and bytes objects have to be wrapped in their corresponding "ctypes" type, so that they can be converted to the required C data type: >>> printf(b"An int %d, a double %f\n", 1234, c_double(3.14)) An int 1234, a double 3.140000 31 >>> Calling variadic functions -------------------------- On a lot of platforms calling variadic functions through ctypes is exactly the same as calling functions with a fixed number of parameters. On some platforms, and in particular ARM64 for Apple Platforms, the calling convention for variadic functions is different than that for regular functions. On those platforms it is required to specify the "argtypes" attribute for the regular, non-variadic, function arguments: libc.printf.argtypes = [ctypes.c_char_p] Because specifying the attribute does not inhibit portability it is advised to always specify "argtypes" for all variadic functions. Calling functions with your own custom data types ------------------------------------------------- You can also customize "ctypes" argument conversion to allow instances of your own classes be used as function arguments. "ctypes" looks for an "_as_parameter_" attribute and uses this as the function argument. The attribute must be an integer, string, bytes, a "ctypes" instance, or an object with an "_as_parameter_" attribute: >>> class Bottles: ... def __init__(self, number): ... self._as_parameter_ = number ... >>> bottles = Bottles(42) >>> printf(b"%d bottles of beer\n", bottles) 42 bottles of beer 19 >>> If you don’t want to store the instance’s data in the "_as_parameter_" instance variable, you could define a "property" which makes the attribute available on request. Specifying the required argument types (function prototypes) ------------------------------------------------------------ It is possible to specify the required argument types of functions exported from DLLs by setting the "argtypes" attribute. "argtypes" must be a sequence of C data types (the "printf()" function is probably not a good example here, because it takes a variable number and different types of parameters depending on the format string, on the other hand this is quite handy to experiment with this feature): >>> printf.argtypes = [c_char_p, c_char_p, c_int, c_double] >>> printf(b"String '%s', Int %d, Double %f\n", b"Hi", 10, 2.2) String 'Hi', Int 10, Double 2.200000 37 >>> Specifying a format protects against incompatible argument types (just as a prototype for a C function), and tries to convert the arguments to valid types: >>> printf(b"%d %d %d", 1, 2, 3) Traceback (most recent call last): File "", line 1, in ctypes.ArgumentError: argument 2: TypeError: 'int' object cannot be interpreted as ctypes.c_char_p >>> printf(b"%s %d %f\n", b"X", 2, 3) X 2 3.000000 13 >>> If you have defined your own classes which you pass to function calls, you have to implement a "from_param()" class method for them to be able to use them in the "argtypes" sequence. The "from_param()" class method receives the Python object passed to the function call, it should do a typecheck or whatever is needed to make sure this object is acceptable, and then return the object itself, its "_as_parameter_" attribute, or whatever you want to pass as the C function argument in this case. Again, the result should be an integer, string, bytes, a "ctypes" instance, or an object with an "_as_parameter_" attribute. Return types ------------ By default functions are assumed to return the C int type. Other return types can be specified by setting the "restype" attribute of the function object. The C prototype of "time()" is "time_t time(time_t *)". Because "time_t" might be of a different type than the default return type int, you should specify the "restype" attribute: >>> libc.time.restype = c_time_t The argument types can be specified using "argtypes": >>> libc.time.argtypes = (POINTER(c_time_t),) To call the function with a "NULL" pointer as first argument, use "None": >>> print(libc.time(None)) 1150640792 Here is a more advanced example, it uses the "strchr()" function, which expects a string pointer and a char, and returns a pointer to a string: >>> strchr = libc.strchr >>> strchr(b"abcdef", ord("d")) 8059983 >>> strchr.restype = c_char_p # c_char_p is a pointer to a string >>> strchr(b"abcdef", ord("d")) b'def' >>> print(strchr(b"abcdef", ord("x"))) None >>> If you want to avoid the "ord("x")" calls above, you can set the "argtypes" attribute, and the second argument will be converted from a single character Python bytes object into a C char: >>> strchr.restype = c_char_p >>> strchr.argtypes = [c_char_p, c_char] >>> strchr(b"abcdef", b"d") b'def' >>> strchr(b"abcdef", b"def") Traceback (most recent call last): ctypes.ArgumentError: argument 2: TypeError: one character bytes, bytearray or integer expected >>> print(strchr(b"abcdef", b"x")) None >>> strchr(b"abcdef", b"d") b'def' >>> You can also use a callable Python object (a function or a class for example) as the "restype" attribute, if the foreign function returns an integer. The callable will be called with the *integer* the C function returns, and the result of this call will be used as the result of your function call. This is useful to check for error return values and automatically raise an exception: >>> GetModuleHandle = windll.kernel32.GetModuleHandleA >>> def ValidHandle(value): ... if value == 0: ... raise WinError() ... return value ... >>> >>> GetModuleHandle.restype = ValidHandle >>> GetModuleHandle(None) 486539264 >>> GetModuleHandle("something silly") Traceback (most recent call last): File "", line 1, in File "", line 3, in ValidHandle OSError: [Errno 126] The specified module could not be found. >>> "WinError" is a function which will call Windows "FormatMessage()" api to get the string representation of an error code, and *returns* an exception. "WinError" takes an optional error code parameter, if no one is used, it calls "GetLastError()" to retrieve it. Please note that a much more powerful error checking mechanism is available through the "errcheck" attribute; see the reference manual for details. Passing pointers (or: passing parameters by reference) ------------------------------------------------------ Sometimes a C api function expects a *pointer* to a data type as parameter, probably to write into the corresponding location, or if the data is too large to be passed by value. This is also known as *passing parameters by reference*. "ctypes" exports the "byref()" function which is used to pass parameters by reference. The same effect can be achieved with the "pointer()" function, although "pointer()" does a lot more work since it constructs a real pointer object, so it is faster to use "byref()" if you don’t need the pointer object in Python itself: >>> i = c_int() >>> f = c_float() >>> s = create_string_buffer(b'\000' * 32) >>> print(i.value, f.value, repr(s.value)) 0 0.0 b'' >>> libc.sscanf(b"1 3.14 Hello", b"%d %f %s", ... byref(i), byref(f), s) 3 >>> print(i.value, f.value, repr(s.value)) 1 3.1400001049 b'Hello' >>> Structures and unions --------------------- Structures and unions must derive from the "Structure" and "Union" base classes which are defined in the "ctypes" module. Each subclass must define a "_fields_" attribute. "_fields_" must be a list of *2-tuples*, containing a *field name* and a *field type*. The field type must be a "ctypes" type like "c_int", or any other derived "ctypes" type: structure, union, array, pointer. Here is a simple example of a POINT structure, which contains two integers named *x* and *y*, and also shows how to initialize a structure in the constructor: >>> from ctypes import * >>> class POINT(Structure): ... _fields_ = [("x", c_int), ... ("y", c_int)] ... >>> point = POINT(10, 20) >>> print(point.x, point.y) 10 20 >>> point = POINT(y=5) >>> print(point.x, point.y) 0 5 >>> POINT(1, 2, 3) Traceback (most recent call last): File "", line 1, in TypeError: too many initializers >>> You can, however, build much more complicated structures. A structure can itself contain other structures by using a structure as a field type. Here is a RECT structure which contains two POINTs named *upperleft* and *lowerright*: >>> class RECT(Structure): ... _fields_ = [("upperleft", POINT), ... ("lowerright", POINT)] ... >>> rc = RECT(point) >>> print(rc.upperleft.x, rc.upperleft.y) 0 5 >>> print(rc.lowerright.x, rc.lowerright.y) 0 0 >>> Nested structures can also be initialized in the constructor in several ways: >>> r = RECT(POINT(1, 2), POINT(3, 4)) >>> r = RECT((1, 2), (3, 4)) Field *descriptor*s can be retrieved from the *class*, they are useful for debugging because they can provide useful information: >>> print(POINT.x) >>> print(POINT.y) >>> Warning: "ctypes" does not support passing unions or structures with bit- fields to functions by value. While this may work on 32-bit x86, it’s not guaranteed by the library to work in the general case. Unions and structures with bit-fields should always be passed to functions by pointer. Structure/union alignment and byte order ---------------------------------------- By default, Structure and Union fields are aligned in the same way the C compiler does it. It is possible to override this behavior by specifying a "_pack_" class attribute in the subclass definition. This must be set to a positive integer and specifies the maximum alignment for the fields. This is what "#pragma pack(n)" also does in MSVC. It is also possible to set a minimum alignment for how the subclass itself is packed in the same way "#pragma align(n)" works in MSVC. This can be achieved by specifying a "_align_" class attribute in the subclass definition. "ctypes" uses the native byte order for Structures and Unions. To build structures with non-native byte order, you can use one of the "BigEndianStructure", "LittleEndianStructure", "BigEndianUnion", and "LittleEndianUnion" base classes. These classes cannot contain pointer fields. Bit fields in structures and unions ----------------------------------- It is possible to create structures and unions containing bit fields. Bit fields are only possible for integer fields, the bit width is specified as the third item in the "_fields_" tuples: >>> class Int(Structure): ... _fields_ = [("first_16", c_int, 16), ... ("second_16", c_int, 16)] ... >>> print(Int.first_16) >>> print(Int.second_16) >>> Arrays ------ Arrays are sequences, containing a fixed number of instances of the same type. The recommended way to create array types is by multiplying a data type with a positive integer: TenPointsArrayType = POINT * 10 Here is an example of a somewhat artificial data type, a structure containing 4 POINTs among other stuff: >>> from ctypes import * >>> class POINT(Structure): ... _fields_ = ("x", c_int), ("y", c_int) ... >>> class MyStruct(Structure): ... _fields_ = [("a", c_int), ... ("b", c_float), ... ("point_array", POINT * 4)] >>> >>> print(len(MyStruct().point_array)) 4 >>> Instances are created in the usual way, by calling the class: arr = TenPointsArrayType() for pt in arr: print(pt.x, pt.y) The above code print a series of "0 0" lines, because the array contents is initialized to zeros. Initializers of the correct type can also be specified: >>> from ctypes import * >>> TenIntegers = c_int * 10 >>> ii = TenIntegers(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) >>> print(ii) >>> for i in ii: print(i, end=" ") ... 1 2 3 4 5 6 7 8 9 10 >>> Pointers -------- Pointer instances are created by calling the "pointer()" function on a "ctypes" type: >>> from ctypes import * >>> i = c_int(42) >>> pi = pointer(i) >>> Pointer instances have a "contents" attribute which returns the object to which the pointer points, the "i" object above: >>> pi.contents c_long(42) >>> Note that "ctypes" does not have OOR (original object return), it constructs a new, equivalent object each time you retrieve an attribute: >>> pi.contents is i False >>> pi.contents is pi.contents False >>> Assigning another "c_int" instance to the pointer’s contents attribute would cause the pointer to point to the memory location where this is stored: >>> i = c_int(99) >>> pi.contents = i >>> pi.contents c_long(99) >>> Pointer instances can also be indexed with integers: >>> pi[0] 99 >>> Assigning to an integer index changes the pointed to value: >>> print(i) c_long(99) >>> pi[0] = 22 >>> print(i) c_long(22) >>> It is also possible to use indexes different from 0, but you must know what you’re doing, just as in C: You can access or change arbitrary memory locations. Generally you only use this feature if you receive a pointer from a C function, and you *know* that the pointer actually points to an array instead of a single item. Behind the scenes, the "pointer()" function does more than simply create pointer instances, it has to create pointer *types* first. This is done with the "POINTER()" function, which accepts any "ctypes" type, and returns a new type: >>> PI = POINTER(c_int) >>> PI >>> PI(42) Traceback (most recent call last): File "", line 1, in TypeError: expected c_long instead of int >>> PI(c_int(42)) >>> Calling the pointer type without an argument creates a "NULL" pointer. "NULL" pointers have a "False" boolean value: >>> null_ptr = POINTER(c_int)() >>> print(bool(null_ptr)) False >>> "ctypes" checks for "NULL" when dereferencing pointers (but dereferencing invalid non-"NULL" pointers would crash Python): >>> null_ptr[0] Traceback (most recent call last): .... ValueError: NULL pointer access >>> >>> null_ptr[0] = 1234 Traceback (most recent call last): .... ValueError: NULL pointer access >>> Type conversions ---------------- Usually, ctypes does strict type checking. This means, if you have "POINTER(c_int)" in the "argtypes" list of a function or as the type of a member field in a structure definition, only instances of exactly the same type are accepted. There are some exceptions to this rule, where ctypes accepts other objects. For example, you can pass compatible array instances instead of pointer types. So, for "POINTER(c_int)", ctypes accepts an array of c_int: >>> class Bar(Structure): ... _fields_ = [("count", c_int), ("values", POINTER(c_int))] ... >>> bar = Bar() >>> bar.values = (c_int * 3)(1, 2, 3) >>> bar.count = 3 >>> for i in range(bar.count): ... print(bar.values[i]) ... 1 2 3 >>> In addition, if a function argument is explicitly declared to be a pointer type (such as "POINTER(c_int)") in "argtypes", an object of the pointed type ("c_int" in this case) can be passed to the function. ctypes will apply the required "byref()" conversion in this case automatically. To set a POINTER type field to "NULL", you can assign "None": >>> bar.values = None >>> Sometimes you have instances of incompatible types. In C, you can cast one type into another type. "ctypes" provides a "cast()" function which can be used in the same way. The "Bar" structure defined above accepts "POINTER(c_int)" pointers or "c_int" arrays for its "values" field, but not instances of other types: >>> bar.values = (c_byte * 4)() Traceback (most recent call last): File "", line 1, in TypeError: incompatible types, c_byte_Array_4 instance instead of LP_c_long instance >>> For these cases, the "cast()" function is handy. The "cast()" function can be used to cast a ctypes instance into a pointer to a different ctypes data type. "cast()" takes two parameters, a ctypes object that is or can be converted to a pointer of some kind, and a ctypes pointer type. It returns an instance of the second argument, which references the same memory block as the first argument: >>> a = (c_byte * 4)() >>> cast(a, POINTER(c_int)) >>> So, "cast()" can be used to assign to the "values" field of "Bar" the structure: >>> bar = Bar() >>> bar.values = cast((c_byte * 4)(), POINTER(c_int)) >>> print(bar.values[0]) 0 >>> Incomplete Types ---------------- *Incomplete Types* are structures, unions or arrays whose members are not yet specified. In C, they are specified by forward declarations, which are defined later: struct cell; /* forward declaration */ struct cell { char *name; struct cell *next; }; The straightforward translation into ctypes code would be this, but it does not work: >>> class cell(Structure): ... _fields_ = [("name", c_char_p), ... ("next", POINTER(cell))] ... Traceback (most recent call last): File "", line 1, in File "", line 2, in cell NameError: name 'cell' is not defined >>> because the new "class cell" is not available in the class statement itself. In "ctypes", we can define the "cell" class and set the "_fields_" attribute later, after the class statement: >>> from ctypes import * >>> class cell(Structure): ... pass ... >>> cell._fields_ = [("name", c_char_p), ... ("next", POINTER(cell))] >>> Let’s try it. We create two instances of "cell", and let them point to each other, and finally follow the pointer chain a few times: >>> c1 = cell() >>> c1.name = b"foo" >>> c2 = cell() >>> c2.name = b"bar" >>> c1.next = pointer(c2) >>> c2.next = pointer(c1) >>> p = c1 >>> for i in range(8): ... print(p.name, end=" ") ... p = p.next[0] ... foo bar foo bar foo bar foo bar >>> Callback functions ------------------ "ctypes" allows creating C callable function pointers from Python callables. These are sometimes called *callback functions*. First, you must create a class for the callback function. The class knows the calling convention, the return type, and the number and types of arguments this function will receive. The "CFUNCTYPE()" factory function creates types for callback functions using the "cdecl" calling convention. On Windows, the "WINFUNCTYPE()" factory function creates types for callback functions using the "stdcall" calling convention. Both of these factory functions are called with the result type as first argument, and the callback functions expected argument types as the remaining arguments. I will present an example here which uses the standard C library’s "qsort()" function, that is used to sort items with the help of a callback function. "qsort()" will be used to sort an array of integers: >>> IntArray5 = c_int * 5 >>> ia = IntArray5(5, 1, 7, 33, 99) >>> qsort = libc.qsort >>> qsort.restype = None >>> "qsort()" must be called with a pointer to the data to sort, the number of items in the data array, the size of one item, and a pointer to the comparison function, the callback. The callback will then be called with two pointers to items, and it must return a negative integer if the first item is smaller than the second, a zero if they are equal, and a positive integer otherwise. So our callback function receives pointers to integers, and must return an integer. First we create the "type" for the callback function: >>> CMPFUNC = CFUNCTYPE(c_int, POINTER(c_int), POINTER(c_int)) >>> To get started, here is a simple callback that shows the values it gets passed: >>> def py_cmp_func(a, b): ... print("py_cmp_func", a[0], b[0]) ... return 0 ... >>> cmp_func = CMPFUNC(py_cmp_func) >>> The result: >>> qsort(ia, len(ia), sizeof(c_int), cmp_func) py_cmp_func 5 1 py_cmp_func 33 99 py_cmp_func 7 33 py_cmp_func 5 7 py_cmp_func 1 7 >>> Now we can actually compare the two items and return a useful result: >>> def py_cmp_func(a, b): ... print("py_cmp_func", a[0], b[0]) ... return a[0] - b[0] ... >>> >>> qsort(ia, len(ia), sizeof(c_int), CMPFUNC(py_cmp_func)) py_cmp_func 5 1 py_cmp_func 33 99 py_cmp_func 7 33 py_cmp_func 1 7 py_cmp_func 5 7 >>> As we can easily check, our array is sorted now: >>> for i in ia: print(i, end=" ") ... 1 5 7 33 99 >>> The function factories can be used as decorator factories, so we may as well write: >>> @CFUNCTYPE(c_int, POINTER(c_int), POINTER(c_int)) ... def py_cmp_func(a, b): ... print("py_cmp_func", a[0], b[0]) ... return a[0] - b[0] ... >>> qsort(ia, len(ia), sizeof(c_int), py_cmp_func) py_cmp_func 5 1 py_cmp_func 33 99 py_cmp_func 7 33 py_cmp_func 1 7 py_cmp_func 5 7 >>> Note: Make sure you keep references to "CFUNCTYPE()" objects as long as they are used from C code. "ctypes" doesn’t, and if you don’t, they may be garbage collected, crashing your program when a callback is made.Also, note that if the callback function is called in a thread created outside of Python’s control (e.g. by the foreign code that calls the callback), ctypes creates a new dummy Python thread on every invocation. This behavior is correct for most purposes, but it means that values stored with "threading.local" will *not* survive across different callbacks, even when those calls are made from the same C thread. Accessing values exported from dlls ----------------------------------- Some shared libraries not only export functions, they also export variables. An example in the Python library itself is the "Py_Version", Python runtime version number encoded in a single constant integer. "ctypes" can access values like this with the "in_dll()" class methods of the type. *pythonapi* is a predefined symbol giving access to the Python C api: >>> version = ctypes.c_int.in_dll(ctypes.pythonapi, "Py_Version") >>> print(hex(version.value)) 0x30c00a0 An extended example which also demonstrates the use of pointers accesses the "PyImport_FrozenModules" pointer exported by Python. Quoting the docs for that value: This pointer is initialized to point to an array of "_frozen" records, terminated by one whose members are all "NULL" or zero. When a frozen module is imported, it is searched in this table. Third-party code could play tricks with this to provide a dynamically created collection of frozen modules. So manipulating this pointer could even prove useful. To restrict the example size, we show only how this table can be read with "ctypes": >>> from ctypes import * >>> >>> class struct_frozen(Structure): ... _fields_ = [("name", c_char_p), ... ("code", POINTER(c_ubyte)), ... ("size", c_int), ... ("get_code", POINTER(c_ubyte)), # Function pointer ... ] ... >>> We have defined the "_frozen" data type, so we can get the pointer to the table: >>> FrozenTable = POINTER(struct_frozen) >>> table = FrozenTable.in_dll(pythonapi, "_PyImport_FrozenBootstrap") >>> Since "table" is a "pointer" to the array of "struct_frozen" records, we can iterate over it, but we just have to make sure that our loop terminates, because pointers have no size. Sooner or later it would probably crash with an access violation or whatever, so it’s better to break out of the loop when we hit the "NULL" entry: >>> for item in table: ... if item.name is None: ... break ... print(item.name.decode("ascii"), item.size) ... _frozen_importlib 31764 _frozen_importlib_external 41499 zipimport 12345 >>> The fact that standard Python has a frozen module and a frozen package (indicated by the negative "size" member) is not well known, it is only used for testing. Try it out with "import __hello__" for example. Surprises --------- There are some edges in "ctypes" where you might expect something other than what actually happens. Consider the following example: >>> from ctypes import * >>> class POINT(Structure): ... _fields_ = ("x", c_int), ("y", c_int) ... >>> class RECT(Structure): ... _fields_ = ("a", POINT), ("b", POINT) ... >>> p1 = POINT(1, 2) >>> p2 = POINT(3, 4) >>> rc = RECT(p1, p2) >>> print(rc.a.x, rc.a.y, rc.b.x, rc.b.y) 1 2 3 4 >>> # now swap the two points >>> rc.a, rc.b = rc.b, rc.a >>> print(rc.a.x, rc.a.y, rc.b.x, rc.b.y) 3 4 3 4 >>> Hm. We certainly expected the last statement to print "3 4 1 2". What happened? Here are the steps of the "rc.a, rc.b = rc.b, rc.a" line above: >>> temp0, temp1 = rc.b, rc.a >>> rc.a = temp0 >>> rc.b = temp1 >>> Note that "temp0" and "temp1" are objects still using the internal buffer of the "rc" object above. So executing "rc.a = temp0" copies the buffer contents of "temp0" into "rc" ‘s buffer. This, in turn, changes the contents of "temp1". So, the last assignment "rc.b = temp1", doesn’t have the expected effect. Keep in mind that retrieving sub-objects from Structure, Unions, and Arrays doesn’t *copy* the sub-object, instead it retrieves a wrapper object accessing the root-object’s underlying buffer. Another example that may behave differently from what one would expect is this: >>> s = c_char_p() >>> s.value = b"abc def ghi" >>> s.value b'abc def ghi' >>> s.value is s.value False >>> Note: Objects instantiated from "c_char_p" can only have their value set to bytes or integers. Why is it printing "False"? ctypes instances are objects containing a memory block plus some *descriptor*s accessing the contents of the memory. Storing a Python object in the memory block does not store the object itself, instead the "contents" of the object is stored. Accessing the contents again constructs a new Python object each time! Variable-sized data types ------------------------- "ctypes" provides some support for variable-sized arrays and structures. The "resize()" function can be used to resize the memory buffer of an existing ctypes object. The function takes the object as first argument, and the requested size in bytes as the second argument. The memory block cannot be made smaller than the natural memory block specified by the objects type, a "ValueError" is raised if this is tried: >>> short_array = (c_short * 4)() >>> print(sizeof(short_array)) 8 >>> resize(short_array, 4) Traceback (most recent call last): ... ValueError: minimum size is 8 >>> resize(short_array, 32) >>> sizeof(short_array) 32 >>> sizeof(type(short_array)) 8 >>> This is nice and fine, but how would one access the additional elements contained in this array? Since the type still only knows about 4 elements, we get errors accessing other elements: >>> short_array[:] [0, 0, 0, 0] >>> short_array[7] Traceback (most recent call last): ... IndexError: invalid index >>> Another way to use variable-sized data types with "ctypes" is to use the dynamic nature of Python, and (re-)define the data type after the required size is already known, on a case by case basis. ctypes reference ================ Finding shared libraries ------------------------ When programming in a compiled language, shared libraries are accessed when compiling/linking a program, and when the program is run. The purpose of the "find_library()" function is to locate a library in a way similar to what the compiler or runtime loader does (on platforms with several versions of a shared library the most recent should be loaded), while the ctypes library loaders act like when a program is run, and call the runtime loader directly. The "ctypes.util" module provides a function which can help to determine the library to load. ctypes.util.find_library(name) Try to find a library and return a pathname. *name* is the library name without any prefix like *lib*, suffix like ".so", ".dylib" or version number (this is the form used for the posix linker option "-l"). If no library can be found, returns "None". The exact functionality is system dependent. On Linux, "find_library()" tries to run external programs ("/sbin/ldconfig", "gcc", "objdump" and "ld") to find the library file. It returns the filename of the library file. Changed in version 3.6: On Linux, the value of the environment variable "LD_LIBRARY_PATH" is used when searching for libraries, if a library cannot be found by any other means. Here are some examples: >>> from ctypes.util import find_library >>> find_library("m") 'libm.so.6' >>> find_library("c") 'libc.so.6' >>> find_library("bz2") 'libbz2.so.1.0' >>> On macOS and Android, "find_library()" uses the system’s standard naming schemes and paths to locate the library, and returns a full pathname if successful: >>> from ctypes.util import find_library >>> find_library("c") '/usr/lib/libc.dylib' >>> find_library("m") '/usr/lib/libm.dylib' >>> find_library("bz2") '/usr/lib/libbz2.dylib' >>> find_library("AGL") '/System/Library/Frameworks/AGL.framework/AGL' >>> On Windows, "find_library()" searches along the system search path, and returns the full pathname, but since there is no predefined naming scheme a call like "find_library("c")" will fail and return "None". If wrapping a shared library with "ctypes", it *may* be better to determine the shared library name at development time, and hardcode that into the wrapper module instead of using "find_library()" to locate the library at runtime. Loading shared libraries ------------------------ There are several ways to load shared libraries into the Python process. One way is to instantiate one of the following classes: class ctypes.CDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False, winmode=None) Instances of this class represent loaded shared libraries. Functions in these libraries use the standard C calling convention, and are assumed to return int. On Windows creating a "CDLL" instance may fail even if the DLL name exists. When a dependent DLL of the loaded DLL is not found, a "OSError" error is raised with the message *“[WinError 126] The specified module could not be found”.* This error message does not contain the name of the missing DLL because the Windows API does not return this information making this error hard to diagnose. To resolve this error and determine which DLL is not found, you need to find the list of dependent DLLs and determine which one is not found using Windows debugging and tracing tools. Changed in version 3.12: The *name* parameter can now be a *path- like object*. See also: Microsoft DUMPBIN tool – A tool to find DLL dependents. class ctypes.OleDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False, winmode=None) Instances of this class represent loaded shared libraries, functions in these libraries use the "stdcall" calling convention, and are assumed to return the windows specific "HRESULT" code. "HRESULT" values contain information specifying whether the function call failed or succeeded, together with additional error code. If the return value signals a failure, an "OSError" is automatically raised. Availability: Windows Changed in version 3.3: "WindowsError" used to be raised, which is now an alias of "OSError". Changed in version 3.12: The *name* parameter can now be a *path- like object*. class ctypes.WinDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False, winmode=None) Instances of this class represent loaded shared libraries, functions in these libraries use the "stdcall" calling convention, and are assumed to return int by default. Availability: Windows Changed in version 3.12: The *name* parameter can now be a *path- like object*. The Python *global interpreter lock* is released before calling any function exported by these libraries, and reacquired afterwards. class ctypes.PyDLL(name, mode=DEFAULT_MODE, handle=None) Instances of this class behave like "CDLL" instances, except that the Python GIL is *not* released during the function call, and after the function execution the Python error flag is checked. If the error flag is set, a Python exception is raised. Thus, this is only useful to call Python C api functions directly. Changed in version 3.12: The *name* parameter can now be a *path- like object*. All these classes can be instantiated by calling them with at least one argument, the pathname of the shared library. If you have an existing handle to an already loaded shared library, it can be passed as the "handle" named parameter, otherwise the underlying platform’s "dlopen()" or "LoadLibrary()" function is used to load the library into the process, and to get a handle to it. The *mode* parameter can be used to specify how the library is loaded. For details, consult the *dlopen(3)* manpage. On Windows, *mode* is ignored. On posix systems, RTLD_NOW is always added, and is not configurable. The *use_errno* parameter, when set to true, enables a ctypes mechanism that allows accessing the system "errno" error number in a safe way. "ctypes" maintains a thread-local copy of the system’s "errno" variable; if you call foreign functions created with "use_errno=True" then the "errno" value before the function call is swapped with the ctypes private copy, the same happens immediately after the function call. The function "ctypes.get_errno()" returns the value of the ctypes private copy, and the function "ctypes.set_errno()" changes the ctypes private copy to a new value and returns the former value. The *use_last_error* parameter, when set to true, enables the same mechanism for the Windows error code which is managed by the "GetLastError()" and "SetLastError()" Windows API functions; "ctypes.get_last_error()" and "ctypes.set_last_error()" are used to request and change the ctypes private copy of the windows error code. The *winmode* parameter is used on Windows to specify how the library is loaded (since *mode* is ignored). It takes any value that is valid for the Win32 API "LoadLibraryEx" flags parameter. When omitted, the default is to use the flags that result in the most secure DLL load, which avoids issues such as DLL hijacking. Passing the full path to the DLL is the safest way to ensure the correct library and dependencies are loaded. Changed in version 3.8: Added *winmode* parameter. ctypes.RTLD_GLOBAL Flag to use as *mode* parameter. On platforms where this flag is not available, it is defined as the integer zero. ctypes.RTLD_LOCAL Flag to use as *mode* parameter. On platforms where this is not available, it is the same as *RTLD_GLOBAL*. ctypes.DEFAULT_MODE The default mode which is used to load shared libraries. On OSX 10.3, this is *RTLD_GLOBAL*, otherwise it is the same as *RTLD_LOCAL*. Instances of these classes have no public methods. Functions exported by the shared library can be accessed as attributes or by index. Please note that accessing the function through an attribute caches the result and therefore accessing it repeatedly returns the same object each time. On the other hand, accessing it through an index returns a new object each time: >>> from ctypes import CDLL >>> libc = CDLL("libc.so.6") # On Linux >>> libc.time == libc.time True >>> libc['time'] == libc['time'] False The following public attributes are available, their name starts with an underscore to not clash with exported function names: PyDLL._handle The system handle used to access the library. PyDLL._name The name of the library passed in the constructor. Shared libraries can also be loaded by using one of the prefabricated objects, which are instances of the "LibraryLoader" class, either by calling the "LoadLibrary()" method, or by retrieving the library as attribute of the loader instance. class ctypes.LibraryLoader(dlltype) Class which loads shared libraries. *dlltype* should be one of the "CDLL", "PyDLL", "WinDLL", or "OleDLL" types. "__getattr__()" has special behavior: It allows loading a shared library by accessing it as attribute of a library loader instance. The result is cached, so repeated attribute accesses return the same library each time. LoadLibrary(name) Load a shared library into the process and return it. This method always returns a new instance of the library. These prefabricated library loaders are available: ctypes.cdll Creates "CDLL" instances. ctypes.windll Creates "WinDLL" instances. Availability: Windows ctypes.oledll Creates "OleDLL" instances. Availability: Windows ctypes.pydll Creates "PyDLL" instances. For accessing the C Python api directly, a ready-to-use Python shared library object is available: ctypes.pythonapi An instance of "PyDLL" that exposes Python C API functions as attributes. Note that all these functions are assumed to return C int, which is of course not always the truth, so you have to assign the correct "restype" attribute to use these functions. Loading a library through any of these objects raises an auditing event "ctypes.dlopen" with string argument "name", the name used to load the library. Accessing a function on a loaded library raises an auditing event "ctypes.dlsym" with arguments "library" (the library object) and "name" (the symbol’s name as a string or integer). In cases when only the library handle is available rather than the object, accessing a function raises an auditing event "ctypes.dlsym/handle" with arguments "handle" (the raw library handle) and "name". Foreign functions ----------------- As explained in the previous section, foreign functions can be accessed as attributes of loaded shared libraries. The function objects created in this way by default accept any number of arguments, accept any ctypes data instances as arguments, and return the default result type specified by the library loader. They are instances of a private local class "_FuncPtr" (not exposed in "ctypes") which inherits from the private "_CFuncPtr" class: >>> import ctypes >>> lib = ctypes.CDLL(None) >>> issubclass(lib._FuncPtr, ctypes._CFuncPtr) True >>> lib._FuncPtr is ctypes._CFuncPtr False class ctypes._CFuncPtr Base class for C callable foreign functions. Instances of foreign functions are also C compatible data types; they represent C function pointers. This behavior can be customized by assigning to special attributes of the foreign function object. restype Assign a ctypes type to specify the result type of the foreign function. Use "None" for void, a function not returning anything. It is possible to assign a callable Python object that is not a ctypes type, in this case the function is assumed to return a C int, and the callable will be called with this integer, allowing further processing or error checking. Using this is deprecated, for more flexible post processing or error checking use a ctypes data type as "restype" and assign a callable to the "errcheck" attribute. argtypes Assign a tuple of ctypes types to specify the argument types that the function accepts. Functions using the "stdcall" calling convention can only be called with the same number of arguments as the length of this tuple; functions using the C calling convention accept additional, unspecified arguments as well. When a foreign function is called, each actual argument is passed to the "from_param()" class method of the items in the "argtypes" tuple, this method allows adapting the actual argument to an object that the foreign function accepts. For example, a "c_char_p" item in the "argtypes" tuple will convert a string passed as argument into a bytes object using ctypes conversion rules. New: It is now possible to put items in argtypes which are not ctypes types, but each item must have a "from_param()" method which returns a value usable as argument (integer, string, ctypes instance). This allows defining adapters that can adapt custom objects as function parameters. errcheck Assign a Python function or another callable to this attribute. The callable will be called with three or more arguments: callable(result, func, arguments) *result* is what the foreign function returns, as specified by the "restype" attribute. *func* is the foreign function object itself, this allows reusing the same callable object to check or post process the results of several functions. *arguments* is a tuple containing the parameters originally passed to the function call, this allows specializing the behavior on the arguments used. The object that this function returns will be returned from the foreign function call, but it can also check the result value and raise an exception if the foreign function call failed. exception ctypes.ArgumentError This exception is raised when a foreign function call cannot convert one of the passed arguments. On Windows, when a foreign function call raises a system exception (for example, due to an access violation), it will be captured and replaced with a suitable Python exception. Further, an auditing event "ctypes.set_exception" with argument "code" will be raised, allowing an audit hook to replace the exception with its own. Some ways to invoke foreign function calls may raise an auditing event "ctypes.call_function" with arguments "function pointer" and "arguments". Function prototypes ------------------- Foreign functions can also be created by instantiating function prototypes. Function prototypes are similar to function prototypes in C; they describe a function (return type, argument types, calling convention) without defining an implementation. The factory functions must be called with the desired result type and the argument types of the function, and can be used as decorator factories, and as such, be applied to functions through the "@wrapper" syntax. See Callback functions for examples. ctypes.CFUNCTYPE(restype, *argtypes, use_errno=False, use_last_error=False) The returned function prototype creates functions that use the standard C calling convention. The function will release the GIL during the call. If *use_errno* is set to true, the ctypes private copy of the system "errno" variable is exchanged with the real "errno" value before and after the call; *use_last_error* does the same for the Windows error code. ctypes.WINFUNCTYPE(restype, *argtypes, use_errno=False, use_last_error=False) The returned function prototype creates functions that use the "stdcall" calling convention. The function will release the GIL during the call. *use_errno* and *use_last_error* have the same meaning as above. Availability: Windows ctypes.PYFUNCTYPE(restype, *argtypes) The returned function prototype creates functions that use the Python calling convention. The function will *not* release the GIL during the call. Function prototypes created by these factory functions can be instantiated in different ways, depending on the type and number of the parameters in the call: prototype(address) Returns a foreign function at the specified address which must be an integer. prototype(callable) Create a C callable function (a callback function) from a Python *callable*. prototype(func_spec[, paramflags]) Returns a foreign function exported by a shared library. *func_spec* must be a 2-tuple "(name_or_ordinal, library)". The first item is the name of the exported function as string, or the ordinal of the exported function as small integer. The second item is the shared library instance. prototype(vtbl_index, name[, paramflags[, iid]]) Returns a foreign function that will call a COM method. *vtbl_index* is the index into the virtual function table, a small non-negative integer. *name* is name of the COM method. *iid* is an optional pointer to the interface identifier which is used in extended error reporting. COM methods use a special calling convention: They require a pointer to the COM interface as first argument, in addition to those parameters that are specified in the "argtypes" tuple. The optional *paramflags* parameter creates foreign function wrappers with much more functionality than the features described above. *paramflags* must be a tuple of the same length as "argtypes". Each item in this tuple contains further information about a parameter, it must be a tuple containing one, two, or three items. The first item is an integer containing a combination of direction flags for the parameter: 1 Specifies an input parameter to the function. 2 Output parameter. The foreign function fills in a value. 4 Input parameter which defaults to the integer zero. The optional second item is the parameter name as string. If this is specified, the foreign function can be called with named parameters. The optional third item is the default value for this parameter. The following example demonstrates how to wrap the Windows "MessageBoxW" function so that it supports default parameters and named arguments. The C declaration from the windows header file is this: WINUSERAPI int WINAPI MessageBoxW( HWND hWnd, LPCWSTR lpText, LPCWSTR lpCaption, UINT uType); Here is the wrapping with "ctypes": >>> from ctypes import c_int, WINFUNCTYPE, windll >>> from ctypes.wintypes import HWND, LPCWSTR, UINT >>> prototype = WINFUNCTYPE(c_int, HWND, LPCWSTR, LPCWSTR, UINT) >>> paramflags = (1, "hwnd", 0), (1, "text", "Hi"), (1, "caption", "Hello from ctypes"), (1, "flags", 0) >>> MessageBox = prototype(("MessageBoxW", windll.user32), paramflags) The "MessageBox" foreign function can now be called in these ways: >>> MessageBox() >>> MessageBox(text="Spam, spam, spam") >>> MessageBox(flags=2, text="foo bar") A second example demonstrates output parameters. The win32 "GetWindowRect" function retrieves the dimensions of a specified window by copying them into "RECT" structure that the caller has to supply. Here is the C declaration: WINUSERAPI BOOL WINAPI GetWindowRect( HWND hWnd, LPRECT lpRect); Here is the wrapping with "ctypes": >>> from ctypes import POINTER, WINFUNCTYPE, windll, WinError >>> from ctypes.wintypes import BOOL, HWND, RECT >>> prototype = WINFUNCTYPE(BOOL, HWND, POINTER(RECT)) >>> paramflags = (1, "hwnd"), (2, "lprect") >>> GetWindowRect = prototype(("GetWindowRect", windll.user32), paramflags) >>> Functions with output parameters will automatically return the output parameter value if there is a single one, or a tuple containing the output parameter values when there are more than one, so the GetWindowRect function now returns a RECT instance, when called. Output parameters can be combined with the "errcheck" protocol to do further output processing and error checking. The win32 "GetWindowRect" api function returns a "BOOL" to signal success or failure, so this function could do the error checking, and raises an exception when the api call failed: >>> def errcheck(result, func, args): ... if not result: ... raise WinError() ... return args ... >>> GetWindowRect.errcheck = errcheck >>> If the "errcheck" function returns the argument tuple it receives unchanged, "ctypes" continues the normal processing it does on the output parameters. If you want to return a tuple of window coordinates instead of a "RECT" instance, you can retrieve the fields in the function and return them instead, the normal processing will no longer take place: >>> def errcheck(result, func, args): ... if not result: ... raise WinError() ... rc = args[1] ... return rc.left, rc.top, rc.bottom, rc.right ... >>> GetWindowRect.errcheck = errcheck >>> Utility functions ----------------- ctypes.addressof(obj) Returns the address of the memory buffer as integer. *obj* must be an instance of a ctypes type. Raises an auditing event "ctypes.addressof" with argument "obj". ctypes.alignment(obj_or_type) Returns the alignment requirements of a ctypes type. *obj_or_type* must be a ctypes type or instance. ctypes.byref(obj[, offset]) Returns a light-weight pointer to *obj*, which must be an instance of a ctypes type. *offset* defaults to zero, and must be an integer that will be added to the internal pointer value. "byref(obj, offset)" corresponds to this C code: (((char *)&obj) + offset) The returned object can only be used as a foreign function call parameter. It behaves similar to "pointer(obj)", but the construction is a lot faster. ctypes.cast(obj, type) This function is similar to the cast operator in C. It returns a new instance of *type* which points to the same memory block as *obj*. *type* must be a pointer type, and *obj* must be an object that can be interpreted as a pointer. ctypes.create_string_buffer(init, size=None) ctypes.create_string_buffer(size) This function creates a mutable character buffer. The returned object is a ctypes array of "c_char". If *size* is given (and not "None"), it must be an "int". It specifies the size of the returned array. If the *init* argument is given, it must be "bytes". It is used to initialize the array items. Bytes not initialized this way are set to zero (NUL). If *size* is not given (or if it is "None"), the buffer is made one element larger than *init*, effectively adding a NUL terminator. If both arguments are given, *size* must not be less than "len(init)". Warning: If *size* is equal to "len(init)", a NUL terminator is not added. Do not treat such a buffer as a C string. For example: >>> bytes(create_string_buffer(2)) b'\x00\x00' >>> bytes(create_string_buffer(b'ab')) b'ab\x00' >>> bytes(create_string_buffer(b'ab', 2)) b'ab' >>> bytes(create_string_buffer(b'ab', 4)) b'ab\x00\x00' >>> bytes(create_string_buffer(b'abcdef', 2)) Traceback (most recent call last): ... ValueError: byte string too long Raises an auditing event "ctypes.create_string_buffer" with arguments "init", "size". ctypes.create_unicode_buffer(init, size=None) ctypes.create_unicode_buffer(size) This function creates a mutable unicode character buffer. The returned object is a ctypes array of "c_wchar". The function takes the same arguments as "create_string_buffer()" except *init* must be a string and *size* counts "c_wchar". Raises an auditing event "ctypes.create_unicode_buffer" with arguments "init", "size". ctypes.DllCanUnloadNow() This function is a hook which allows implementing in-process COM servers with ctypes. It is called from the DllCanUnloadNow function that the _ctypes extension dll exports. Availability: Windows ctypes.DllGetClassObject() This function is a hook which allows implementing in-process COM servers with ctypes. It is called from the DllGetClassObject function that the "_ctypes" extension dll exports. Availability: Windows ctypes.util.find_library(name) Try to find a library and return a pathname. *name* is the library name without any prefix like "lib", suffix like ".so", ".dylib" or version number (this is the form used for the posix linker option "-l"). If no library can be found, returns "None". The exact functionality is system dependent. ctypes.util.find_msvcrt() Returns the filename of the VC runtime library used by Python, and by the extension modules. If the name of the library cannot be determined, "None" is returned. If you need to free memory, for example, allocated by an extension module with a call to the "free(void *)", it is important that you use the function in the same library that allocated the memory. Availability: Windows ctypes.FormatError([code]) Returns a textual description of the error code *code*. If no error code is specified, the last error code is used by calling the Windows api function GetLastError. Availability: Windows ctypes.GetLastError() Returns the last error code set by Windows in the calling thread. This function calls the Windows "GetLastError()" function directly, it does not return the ctypes-private copy of the error code. Availability: Windows ctypes.get_errno() Returns the current value of the ctypes-private copy of the system "errno" variable in the calling thread. Raises an auditing event "ctypes.get_errno" with no arguments. ctypes.get_last_error() Returns the current value of the ctypes-private copy of the system "LastError" variable in the calling thread. Availability: Windows Raises an auditing event "ctypes.get_last_error" with no arguments. ctypes.memmove(dst, src, count) Same as the standard C memmove library function: copies *count* bytes from *src* to *dst*. *dst* and *src* must be integers or ctypes instances that can be converted to pointers. ctypes.memset(dst, c, count) Same as the standard C memset library function: fills the memory block at address *dst* with *count* bytes of value *c*. *dst* must be an integer specifying an address, or a ctypes instance. ctypes.POINTER(type, /) Create and return a new ctypes pointer type. Pointer types are cached and reused internally, so calling this function repeatedly is cheap. *type* must be a ctypes type. ctypes.pointer(obj, /) Create a new pointer instance, pointing to *obj*. The returned object is of the type "POINTER(type(obj))". Note: If you just want to pass a pointer to an object to a foreign function call, you should use "byref(obj)" which is much faster. ctypes.resize(obj, size) This function resizes the internal memory buffer of *obj*, which must be an instance of a ctypes type. It is not possible to make the buffer smaller than the native size of the objects type, as given by "sizeof(type(obj))", but it is possible to enlarge the buffer. ctypes.set_errno(value) Set the current value of the ctypes-private copy of the system "errno" variable in the calling thread to *value* and return the previous value. Raises an auditing event "ctypes.set_errno" with argument "errno". ctypes.set_last_error(value) Sets the current value of the ctypes-private copy of the system "LastError" variable in the calling thread to *value* and return the previous value. Availability: Windows Raises an auditing event "ctypes.set_last_error" with argument "error". ctypes.sizeof(obj_or_type) Returns the size in bytes of a ctypes type or instance memory buffer. Does the same as the C "sizeof" operator. ctypes.string_at(ptr, size=-1) Return the byte string at *void *ptr*. If *size* is specified, it is used as size, otherwise the string is assumed to be zero- terminated. Raises an auditing event "ctypes.string_at" with arguments "ptr", "size". ctypes.WinError(code=None, descr=None) This function is probably the worst-named thing in ctypes. It creates an instance of "OSError". If *code* is not specified, "GetLastError" is called to determine the error code. If *descr* is not specified, "FormatError()" is called to get a textual description of the error. Availability: Windows Changed in version 3.3: An instance of "WindowsError" used to be created, which is now an alias of "OSError". ctypes.wstring_at(ptr, size=-1) Return the wide-character string at *void *ptr*. If *size* is specified, it is used as the number of characters of the string, otherwise the string is assumed to be zero-terminated. Raises an auditing event "ctypes.wstring_at" with arguments "ptr", "size". Data types ---------- class ctypes._CData This non-public class is the common base class of all ctypes data types. Among other things, all ctypes type instances contain a memory block that hold C compatible data; the address of the memory block is returned by the "addressof()" helper function. Another instance variable is exposed as "_objects"; this contains other Python objects that need to be kept alive in case the memory block contains pointers. Common methods of ctypes data types, these are all class methods (to be exact, they are methods of the *metaclass*): from_buffer(source[, offset]) This method returns a ctypes instance that shares the buffer of the *source* object. The *source* object must support the writeable buffer interface. The optional *offset* parameter specifies an offset into the source buffer in bytes; the default is zero. If the source buffer is not large enough a "ValueError" is raised. Raises an auditing event "ctypes.cdata/buffer" with arguments "pointer", "size", "offset". from_buffer_copy(source[, offset]) This method creates a ctypes instance, copying the buffer from the *source* object buffer which must be readable. The optional *offset* parameter specifies an offset into the source buffer in bytes; the default is zero. If the source buffer is not large enough a "ValueError" is raised. Raises an auditing event "ctypes.cdata/buffer" with arguments "pointer", "size", "offset". from_address(address) This method returns a ctypes type instance using the memory specified by *address* which must be an integer. This method, and others that indirectly call this method, raises an auditing event "ctypes.cdata" with argument "address". from_param(obj) This method adapts *obj* to a ctypes type. It is called with the actual object used in a foreign function call when the type is present in the foreign function’s "argtypes" tuple; it must return an object that can be used as a function call parameter. All ctypes data types have a default implementation of this classmethod that normally returns *obj* if that is an instance of the type. Some types accept other objects as well. in_dll(library, name) This method returns a ctypes type instance exported by a shared library. *name* is the name of the symbol that exports the data, *library* is the loaded shared library. Common instance variables of ctypes data types: _b_base_ Sometimes ctypes data instances do not own the memory block they contain, instead they share part of the memory block of a base object. The "_b_base_" read-only member is the root ctypes object that owns the memory block. _b_needsfree_ This read-only variable is true when the ctypes data instance has allocated the memory block itself, false otherwise. _objects This member is either "None" or a dictionary containing Python objects that need to be kept alive so that the memory block contents is kept valid. This object is only exposed for debugging; never modify the contents of this dictionary. Fundamental data types ---------------------- class ctypes._SimpleCData This non-public class is the base class of all fundamental ctypes data types. It is mentioned here because it contains the common attributes of the fundamental ctypes data types. "_SimpleCData" is a subclass of "_CData", so it inherits their methods and attributes. ctypes data types that are not and do not contain pointers can now be pickled. Instances have a single attribute: value This attribute contains the actual value of the instance. For integer and pointer types, it is an integer, for character types, it is a single character bytes object or string, for character pointer types it is a Python bytes object or string. When the "value" attribute is retrieved from a ctypes instance, usually a new object is returned each time. "ctypes" does *not* implement original object return, always a new object is constructed. The same is true for all other ctypes object instances. Fundamental data types, when returned as foreign function call results, or, for example, by retrieving structure field members or array items, are transparently converted to native Python types. In other words, if a foreign function has a "restype" of "c_char_p", you will always receive a Python bytes object, *not* a "c_char_p" instance. Subclasses of fundamental data types do *not* inherit this behavior. So, if a foreign functions "restype" is a subclass of "c_void_p", you will receive an instance of this subclass from the function call. Of course, you can get the value of the pointer by accessing the "value" attribute. These are the fundamental ctypes data types: class ctypes.c_byte Represents the C signed char datatype, and interprets the value as small integer. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_char Represents the C char datatype, and interprets the value as a single character. The constructor accepts an optional string initializer, the length of the string must be exactly one character. class ctypes.c_char_p Represents the C char* datatype when it points to a zero-terminated string. For a general character pointer that may also point to binary data, "POINTER(c_char)" must be used. The constructor accepts an integer address, or a bytes object. class ctypes.c_double Represents the C double datatype. The constructor accepts an optional float initializer. class ctypes.c_longdouble Represents the C long double datatype. The constructor accepts an optional float initializer. On platforms where "sizeof(long double) == sizeof(double)" it is an alias to "c_double". class ctypes.c_float Represents the C float datatype. The constructor accepts an optional float initializer. class ctypes.c_int Represents the C signed int datatype. The constructor accepts an optional integer initializer; no overflow checking is done. On platforms where "sizeof(int) == sizeof(long)" it is an alias to "c_long". class ctypes.c_int8 Represents the C 8-bit signed int datatype. Usually an alias for "c_byte". class ctypes.c_int16 Represents the C 16-bit signed int datatype. Usually an alias for "c_short". class ctypes.c_int32 Represents the C 32-bit signed int datatype. Usually an alias for "c_int". class ctypes.c_int64 Represents the C 64-bit signed int datatype. Usually an alias for "c_longlong". class ctypes.c_long Represents the C signed long datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_longlong Represents the C signed long long datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_short Represents the C signed short datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_size_t Represents the C "size_t" datatype. class ctypes.c_ssize_t Represents the C "ssize_t" datatype. Added in version 3.2. class ctypes.c_time_t Represents the C "time_t" datatype. Added in version 3.12. class ctypes.c_ubyte Represents the C unsigned char datatype, it interprets the value as small integer. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_uint Represents the C unsigned int datatype. The constructor accepts an optional integer initializer; no overflow checking is done. On platforms where "sizeof(int) == sizeof(long)" it is an alias for "c_ulong". class ctypes.c_uint8 Represents the C 8-bit unsigned int datatype. Usually an alias for "c_ubyte". class ctypes.c_uint16 Represents the C 16-bit unsigned int datatype. Usually an alias for "c_ushort". class ctypes.c_uint32 Represents the C 32-bit unsigned int datatype. Usually an alias for "c_uint". class ctypes.c_uint64 Represents the C 64-bit unsigned int datatype. Usually an alias for "c_ulonglong". class ctypes.c_ulong Represents the C unsigned long datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_ulonglong Represents the C unsigned long long datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_ushort Represents the C unsigned short datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_void_p Represents the C void* type. The value is represented as integer. The constructor accepts an optional integer initializer. class ctypes.c_wchar Represents the C "wchar_t" datatype, and interprets the value as a single character unicode string. The constructor accepts an optional string initializer, the length of the string must be exactly one character. class ctypes.c_wchar_p Represents the C wchar_t* datatype, which must be a pointer to a zero-terminated wide character string. The constructor accepts an integer address, or a string. class ctypes.c_bool Represent the C bool datatype (more accurately, _Bool from C99). Its value can be "True" or "False", and the constructor accepts any object that has a truth value. class ctypes.HRESULT Represents a "HRESULT" value, which contains success or error information for a function or method call. Availability: Windows class ctypes.py_object Represents the C PyObject* datatype. Calling this without an argument creates a "NULL" PyObject* pointer. The "ctypes.wintypes" module provides quite some other Windows specific data types, for example "HWND", "WPARAM", or "DWORD". Some useful structures like "MSG" or "RECT" are also defined. Structured data types --------------------- class ctypes.Union(*args, **kw) Abstract base class for unions in native byte order. class ctypes.BigEndianUnion(*args, **kw) Abstract base class for unions in *big endian* byte order. Added in version 3.11. class ctypes.LittleEndianUnion(*args, **kw) Abstract base class for unions in *little endian* byte order. Added in version 3.11. class ctypes.BigEndianStructure(*args, **kw) Abstract base class for structures in *big endian* byte order. class ctypes.LittleEndianStructure(*args, **kw) Abstract base class for structures in *little endian* byte order. Structures and unions with non-native byte order cannot contain pointer type fields, or any other data types containing pointer type fields. class ctypes.Structure(*args, **kw) Abstract base class for structures in *native* byte order. Concrete structure and union types must be created by subclassing one of these types, and at least define a "_fields_" class variable. "ctypes" will create *descriptor*s which allow reading and writing the fields by direct attribute accesses. These are the _fields_ A sequence defining the structure fields. The items must be 2-tuples or 3-tuples. The first item is the name of the field, the second item specifies the type of the field; it can be any ctypes data type. For integer type fields like "c_int", a third optional item can be given. It must be a small positive integer defining the bit width of the field. Field names must be unique within one structure or union. This is not checked, only one field can be accessed when names are repeated. It is possible to define the "_fields_" class variable *after* the class statement that defines the Structure subclass, this allows creating data types that directly or indirectly reference themselves: class List(Structure): pass List._fields_ = [("pnext", POINTER(List)), ... ] The "_fields_" class variable must, however, be defined before the type is first used (an instance is created, "sizeof()" is called on it, and so on). Later assignments to the "_fields_" class variable will raise an AttributeError. It is possible to define sub-subclasses of structure types, they inherit the fields of the base class plus the "_fields_" defined in the sub-subclass, if any. _pack_ An optional small integer that allows overriding the alignment of structure fields in the instance. "_pack_" must already be defined when "_fields_" is assigned, otherwise it will have no effect. Setting this attribute to 0 is the same as not setting it at all. _align_ An optional small integer that allows overriding the alignment of the structure when being packed or unpacked to/from memory. Setting this attribute to 0 is the same as not setting it at all. Added in version 3.13. _anonymous_ An optional sequence that lists the names of unnamed (anonymous) fields. "_anonymous_" must be already defined when "_fields_" is assigned, otherwise it will have no effect. The fields listed in this variable must be structure or union type fields. "ctypes" will create descriptors in the structure type that allows accessing the nested fields directly, without the need to create the structure or union field. Here is an example type (Windows): class _U(Union): _fields_ = [("lptdesc", POINTER(TYPEDESC)), ("lpadesc", POINTER(ARRAYDESC)), ("hreftype", HREFTYPE)] class TYPEDESC(Structure): _anonymous_ = ("u",) _fields_ = [("u", _U), ("vt", VARTYPE)] The "TYPEDESC" structure describes a COM data type, the "vt" field specifies which one of the union fields is valid. Since the "u" field is defined as anonymous field, it is now possible to access the members directly off the TYPEDESC instance. "td.lptdesc" and "td.u.lptdesc" are equivalent, but the former is faster since it does not need to create a temporary union instance: td = TYPEDESC() td.vt = VT_PTR td.lptdesc = POINTER(some_type) td.u.lptdesc = POINTER(some_type) It is possible to define sub-subclasses of structures, they inherit the fields of the base class. If the subclass definition has a separate "_fields_" variable, the fields specified in this are appended to the fields of the base class. Structure and union constructors accept both positional and keyword arguments. Positional arguments are used to initialize member fields in the same order as they are appear in "_fields_". Keyword arguments in the constructor are interpreted as attribute assignments, so they will initialize "_fields_" with the same name, or create new attributes for names not present in "_fields_". Arrays and pointers ------------------- class ctypes.Array(*args) Abstract base class for arrays. The recommended way to create concrete array types is by multiplying any "ctypes" data type with a non-negative integer. Alternatively, you can subclass this type and define "_length_" and "_type_" class variables. Array elements can be read and written using standard subscript and slice accesses; for slice reads, the resulting object is *not* itself an "Array". _length_ A positive integer specifying the number of elements in the array. Out-of-range subscripts result in an "IndexError". Will be returned by "len()". _type_ Specifies the type of each element in the array. Array subclass constructors accept positional arguments, used to initialize the elements in order. ctypes.ARRAY(type, length) Create an array. Equivalent to "type * length", where *type* is a "ctypes" data type and *length* an integer. This function is *soft deprecated* in favor of multiplication. There are no plans to remove it. class ctypes._Pointer Private, abstract base class for pointers. Concrete pointer types are created by calling "POINTER()" with the type that will be pointed to; this is done automatically by "pointer()". If a pointer points to an array, its elements can be read and written using standard subscript and slice accesses. Pointer objects have no size, so "len()" will raise "TypeError". Negative subscripts will read from the memory *before* the pointer (as in C), and out-of-range subscripts will probably crash with an access violation (if you’re lucky). _type_ Specifies the type pointed to. contents Returns the object to which to pointer points. Assigning to this attribute changes the pointer to point to the assigned object. "curses.ascii" — Utilities for ASCII characters *********************************************** **Source code:** Lib/curses/ascii.py ====================================================================== The "curses.ascii" module supplies name constants for ASCII characters and functions to test membership in various ASCII character classes. The constants supplied are names for control characters as follows: +-----------------+------------------------------------------------+ | Name | Meaning | |=================|================================================| | curses.ascii.N | | | UL | | +-----------------+------------------------------------------------+ | curses.ascii.S | Start of heading, console interrupt | | OH | | +-----------------+------------------------------------------------+ | curses.ascii.S | Start of text | | TX | | +-----------------+------------------------------------------------+ | curses.ascii.E | End of text | | TX | | +-----------------+------------------------------------------------+ | curses.ascii.E | End of transmission | | OT | | +-----------------+------------------------------------------------+ | curses.ascii.E | Enquiry, goes with "ACK" flow control | | NQ | | +-----------------+------------------------------------------------+ | curses.ascii.A | Acknowledgement | | CK | | +-----------------+------------------------------------------------+ | curses.ascii.B | Bell | | EL | | +-----------------+------------------------------------------------+ | curses.ascii.BS | Backspace | +-----------------+------------------------------------------------+ | curses.ascii.T | Tab | | AB | | +-----------------+------------------------------------------------+ | curses.ascii.HT | Alias for "TAB": “Horizontal tab” | +-----------------+------------------------------------------------+ | curses.ascii.LF | Line feed | +-----------------+------------------------------------------------+ | curses.ascii.NL | Alias for "LF": “New line” | +-----------------+------------------------------------------------+ | curses.ascii.VT | Vertical tab | +-----------------+------------------------------------------------+ | curses.ascii.FF | Form feed | +-----------------+------------------------------------------------+ | curses.ascii.CR | Carriage return | +-----------------+------------------------------------------------+ | curses.ascii.SO | Shift-out, begin alternate character set | +-----------------+------------------------------------------------+ | curses.ascii.SI | Shift-in, resume default character set | +-----------------+------------------------------------------------+ | curses.ascii.D | Data-link escape | | LE | | +-----------------+------------------------------------------------+ | curses.ascii.D | XON, for flow control | | C1 | | +-----------------+------------------------------------------------+ | curses.ascii.D | Device control 2, block-mode flow control | | C2 | | +-----------------+------------------------------------------------+ | curses.ascii.D | XOFF, for flow control | | C3 | | +-----------------+------------------------------------------------+ | curses.ascii.D | Device control 4 | | C4 | | +-----------------+------------------------------------------------+ | curses.ascii.N | Negative acknowledgement | | AK | | +-----------------+------------------------------------------------+ | curses.ascii.S | Synchronous idle | | YN | | +-----------------+------------------------------------------------+ | curses.ascii.E | End transmission block | | TB | | +-----------------+------------------------------------------------+ | curses.ascii.C | Cancel | | AN | | +-----------------+------------------------------------------------+ | curses.ascii.EM | End of medium | +-----------------+------------------------------------------------+ | curses.ascii.S | Substitute | | UB | | +-----------------+------------------------------------------------+ | curses.ascii.E | Escape | | SC | | +-----------------+------------------------------------------------+ | curses.ascii.FS | File separator | +-----------------+------------------------------------------------+ | curses.ascii.GS | Group separator | +-----------------+------------------------------------------------+ | curses.ascii.RS | Record separator, block-mode terminator | +-----------------+------------------------------------------------+ | curses.ascii.US | Unit separator | +-----------------+------------------------------------------------+ | curses.ascii.SP | Space | +-----------------+------------------------------------------------+ | curses.ascii.D | Delete | | EL | | +-----------------+------------------------------------------------+ Note that many of these have little practical significance in modern usage. The mnemonics derive from teleprinter conventions that predate digital computers. The module supplies the following functions, patterned on those in the standard C library: curses.ascii.isalnum(c) Checks for an ASCII alphanumeric character; it is equivalent to "isalpha(c) or isdigit(c)". curses.ascii.isalpha(c) Checks for an ASCII alphabetic character; it is equivalent to "isupper(c) or islower(c)". curses.ascii.isascii(c) Checks for a character value that fits in the 7-bit ASCII set. curses.ascii.isblank(c) Checks for an ASCII whitespace character; space or horizontal tab. curses.ascii.iscntrl(c) Checks for an ASCII control character (in the range 0x00 to 0x1f or 0x7f). curses.ascii.isdigit(c) Checks for an ASCII decimal digit, "'0'" through "'9'". This is equivalent to "c in string.digits". curses.ascii.isgraph(c) Checks for ASCII any printable character except space. curses.ascii.islower(c) Checks for an ASCII lower-case character. curses.ascii.isprint(c) Checks for any ASCII printable character including space. curses.ascii.ispunct(c) Checks for any printable ASCII character which is not a space or an alphanumeric character. curses.ascii.isspace(c) Checks for ASCII white-space characters; space, line feed, carriage return, form feed, horizontal tab, vertical tab. curses.ascii.isupper(c) Checks for an ASCII uppercase letter. curses.ascii.isxdigit(c) Checks for an ASCII hexadecimal digit. This is equivalent to "c in string.hexdigits". curses.ascii.isctrl(c) Checks for an ASCII control character (ordinal values 0 to 31). curses.ascii.ismeta(c) Checks for a non-ASCII character (ordinal values 0x80 and above). These functions accept either integers or single-character strings; when the argument is a string, it is first converted using the built- in function "ord()". Note that all these functions check ordinal bit values derived from the character of the string you pass in; they do not actually know anything about the host machine’s character encoding. The following two functions take either a single-character string or integer byte value; they return a value of the same type. curses.ascii.ascii(c) Return the ASCII value corresponding to the low 7 bits of *c*. curses.ascii.ctrl(c) Return the control character corresponding to the given character (the character bit value is bitwise-anded with 0x1f). curses.ascii.alt(c) Return the 8-bit character corresponding to the given ASCII character (the character bit value is bitwise-ored with 0x80). The following function takes either a single-character string or integer value; it returns a string. curses.ascii.unctrl(c) Return a string representation of the ASCII character *c*. If *c* is printable, this string is the character itself. If the character is a control character (0x00–0x1f) the string consists of a caret ("'^'") followed by the corresponding uppercase letter. If the character is an ASCII delete (0x7f) the string is "'^?'". If the character has its meta bit (0x80) set, the meta bit is stripped, the preceding rules applied, and "'!'" prepended to the result. curses.ascii.controlnames A 33-element string array that contains the ASCII mnemonics for the thirty-two ASCII control characters from 0 (NUL) to 0x1f (US), in order, plus the mnemonic "SP" for the space character. "curses.panel" — A panel stack extension for curses *************************************************** ====================================================================== Panels are windows with the added feature of depth, so they can be stacked on top of each other, and only the visible portions of each window will be displayed. Panels can be added, moved up or down in the stack, and removed. Functions ========= The module "curses.panel" defines the following functions: curses.panel.bottom_panel() Returns the bottom panel in the panel stack. curses.panel.new_panel(win) Returns a panel object, associating it with the given window *win*. Be aware that you need to keep the returned panel object referenced explicitly. If you don’t, the panel object is garbage collected and removed from the panel stack. curses.panel.top_panel() Returns the top panel in the panel stack. curses.panel.update_panels() Updates the virtual screen after changes in the panel stack. This does not call "curses.doupdate()", so you’ll have to do this yourself. Panel Objects ============= Panel objects, as returned by "new_panel()" above, are windows with a stacking order. There’s always a window associated with a panel which determines the content, while the panel methods are responsible for the window’s depth in the panel stack. Panel objects have the following methods: Panel.above() Returns the panel above the current panel. Panel.below() Returns the panel below the current panel. Panel.bottom() Push the panel to the bottom of the stack. Panel.hidden() Returns "True" if the panel is hidden (not visible), "False" otherwise. Panel.hide() Hide the panel. This does not delete the object, it just makes the window on screen invisible. Panel.move(y, x) Move the panel to the screen coordinates "(y, x)". Panel.replace(win) Change the window associated with the panel to the window *win*. Panel.set_userptr(obj) Set the panel’s user pointer to *obj*. This is used to associate an arbitrary piece of data with the panel, and can be any Python object. Panel.show() Display the panel (which might have been hidden). Panel.top() Push panel to the top of the stack. Panel.userptr() Returns the user pointer for the panel. This might be any Python object. Panel.window() Returns the window object associated with the panel. "curses" — Terminal handling for character-cell displays ******************************************************** **Source code:** Lib/curses ====================================================================== The "curses" module provides an interface to the curses library, the de-facto standard for portable advanced terminal handling. While curses is most widely used in the Unix environment, versions are available for Windows, DOS, and possibly other systems as well. This extension module is designed to match the API of ncurses, an open- source curses library hosted on Linux and the BSD variants of Unix. Availability: not Android, not iOS, not WASI. This module is not supported on mobile platforms or WebAssembly platforms. Note: Whenever the documentation mentions a *character* it can be specified as an integer, a one-character Unicode string or a one- byte byte string.Whenever the documentation mentions a *character string* it can be specified as a Unicode string or a byte string. See also: Module "curses.ascii" Utilities for working with ASCII characters, regardless of your locale settings. Module "curses.panel" A panel stack extension that adds depth to curses windows. Module "curses.textpad" Editable text widget for curses supporting **Emacs**-like bindings. Curses Programming with Python Tutorial material on using curses with Python, by Andrew Kuchling and Eric Raymond. Functions ========= The module "curses" defines the following exception: exception curses.error Exception raised when a curses library function returns an error. Note: Whenever *x* or *y* arguments to a function or a method are optional, they default to the current cursor location. Whenever *attr* is optional, it defaults to "A_NORMAL". The module "curses" defines the following functions: curses.baudrate() Return the output speed of the terminal in bits per second. On software terminal emulators it will have a fixed high value. Included for historical reasons; in former times, it was used to write output loops for time delays and occasionally to change interfaces depending on the line speed. curses.beep() Emit a short attention sound. curses.can_change_color() Return "True" or "False", depending on whether the programmer can change the colors displayed by the terminal. curses.cbreak() Enter cbreak mode. In cbreak mode (sometimes called “rare” mode) normal tty line buffering is turned off and characters are available to be read one by one. However, unlike raw mode, special characters (interrupt, quit, suspend, and flow control) retain their effects on the tty driver and calling program. Calling first "raw()" then "cbreak()" leaves the terminal in cbreak mode. curses.color_content(color_number) Return the intensity of the red, green, and blue (RGB) components in the color *color_number*, which must be between "0" and "COLORS - 1". Return a 3-tuple, containing the R,G,B values for the given color, which will be between "0" (no component) and "1000" (maximum amount of component). curses.color_pair(pair_number) Return the attribute value for displaying text in the specified color pair. Only the first 256 color pairs are supported. This attribute value can be combined with "A_STANDOUT", "A_REVERSE", and the other "A_*" attributes. "pair_number()" is the counterpart to this function. curses.curs_set(visibility) Set the cursor state. *visibility* can be set to "0", "1", or "2", for invisible, normal, or very visible. If the terminal supports the visibility requested, return the previous cursor state; otherwise raise an exception. On many terminals, the “visible” mode is an underline cursor and the “very visible” mode is a block cursor. curses.def_prog_mode() Save the current terminal mode as the “program” mode, the mode when the running program is using curses. (Its counterpart is the “shell” mode, for when the program is not in curses.) Subsequent calls to "reset_prog_mode()" will restore this mode. curses.def_shell_mode() Save the current terminal mode as the “shell” mode, the mode when the running program is not using curses. (Its counterpart is the “program” mode, when the program is using curses capabilities.) Subsequent calls to "reset_shell_mode()" will restore this mode. curses.delay_output(ms) Insert an *ms* millisecond pause in output. curses.doupdate() Update the physical screen. The curses library keeps two data structures, one representing the current physical screen contents and a virtual screen representing the desired next state. The "doupdate()" ground updates the physical screen to match the virtual screen. The virtual screen may be updated by a "noutrefresh()" call after write operations such as "addstr()" have been performed on a window. The normal "refresh()" call is simply "noutrefresh()" followed by "doupdate()"; if you have to update multiple windows, you can speed performance and perhaps reduce screen flicker by issuing "noutrefresh()" calls on all windows, followed by a single "doupdate()". curses.echo() Enter echo mode. In echo mode, each character input is echoed to the screen as it is entered. curses.endwin() De-initialize the library, and return terminal to normal status. curses.erasechar() Return the user’s current erase character as a one-byte bytes object. Under Unix operating systems this is a property of the controlling tty of the curses program, and is not set by the curses library itself. curses.filter() The "filter()" routine, if used, must be called before "initscr()" is called. The effect is that, during those calls, "LINES" is set to "1"; the capabilities "clear", "cup", "cud", "cud1", "cuu1", "cuu", "vpa" are disabled; and the "home" string is set to the value of "cr". The effect is that the cursor is confined to the current line, and so are screen updates. This may be used for enabling character-at-a-time line editing without touching the rest of the screen. curses.flash() Flash the screen. That is, change it to reverse-video and then change it back in a short interval. Some people prefer such as ‘visible bell’ to the audible attention signal produced by "beep()". curses.flushinp() Flush all input buffers. This throws away any typeahead that has been typed by the user and has not yet been processed by the program. curses.getmouse() After "getch()" returns "KEY_MOUSE" to signal a mouse event, this method should be called to retrieve the queued mouse event, represented as a 5-tuple "(id, x, y, z, bstate)". *id* is an ID value used to distinguish multiple devices, and *x*, *y*, *z* are the event’s coordinates. (*z* is currently unused.) *bstate* is an integer value whose bits will be set to indicate the type of event, and will be the bitwise OR of one or more of the following constants, where *n* is the button number from 1 to 5: "BUTTONn_PRESSED", "BUTTONn_RELEASED", "BUTTONn_CLICKED", "BUTTONn_DOUBLE_CLICKED", "BUTTONn_TRIPLE_CLICKED", "BUTTON_SHIFT", "BUTTON_CTRL", "BUTTON_ALT". Changed in version 3.10: The "BUTTON5_*" constants are now exposed if they are provided by the underlying curses library. curses.getsyx() Return the current coordinates of the virtual screen cursor as a tuple "(y, x)". If "leaveok" is currently "True", then return "(-1, -1)". curses.getwin(file) Read window related data stored in the file by an earlier "window.putwin()" call. The routine then creates and initializes a new window using that data, returning the new window object. curses.has_colors() Return "True" if the terminal can display colors; otherwise, return "False". curses.has_extended_color_support() Return "True" if the module supports extended colors; otherwise, return "False". Extended color support allows more than 256 color pairs for terminals that support more than 16 colors (e.g. xterm- 256color). Extended color support requires ncurses version 6.1 or later. Added in version 3.10. curses.has_ic() Return "True" if the terminal has insert- and delete-character capabilities. This function is included for historical reasons only, as all modern software terminal emulators have such capabilities. curses.has_il() Return "True" if the terminal has insert- and delete-line capabilities, or can simulate them using scrolling regions. This function is included for historical reasons only, as all modern software terminal emulators have such capabilities. curses.has_key(ch) Take a key value *ch*, and return "True" if the current terminal type recognizes a key with that value. curses.halfdelay(tenths) Used for half-delay mode, which is similar to cbreak mode in that characters typed by the user are immediately available to the program. However, after blocking for *tenths* tenths of seconds, raise an exception if nothing has been typed. The value of *tenths* must be a number between "1" and "255". Use "nocbreak()" to leave half-delay mode. curses.init_color(color_number, r, g, b) Change the definition of a color, taking the number of the color to be changed followed by three RGB values (for the amounts of red, green, and blue components). The value of *color_number* must be between "0" and "COLORS - 1". Each of *r*, *g*, *b*, must be a value between "0" and "1000". When "init_color()" is used, all occurrences of that color on the screen immediately change to the new definition. This function is a no-op on most terminals; it is active only if "can_change_color()" returns "True". curses.init_pair(pair_number, fg, bg) Change the definition of a color-pair. It takes three arguments: the number of the color-pair to be changed, the foreground color number, and the background color number. The value of *pair_number* must be between "1" and "COLOR_PAIRS - 1" (the "0" color pair is wired to white on black and cannot be changed). The value of *fg* and *bg* arguments must be between "0" and "COLORS - 1", or, after calling "use_default_colors()", "-1". If the color- pair was previously initialized, the screen is refreshed and all occurrences of that color-pair are changed to the new definition. curses.initscr() Initialize the library. Return a window object which represents the whole screen. Note: If there is an error opening the terminal, the underlying curses library may cause the interpreter to exit. curses.is_term_resized(nlines, ncols) Return "True" if "resize_term()" would modify the window structure, "False" otherwise. curses.isendwin() Return "True" if "endwin()" has been called (that is, the curses library has been deinitialized). curses.keyname(k) Return the name of the key numbered *k* as a bytes object. The name of a key generating printable ASCII character is the key’s character. The name of a control-key combination is a two-byte bytes object consisting of a caret ("b'^'") followed by the corresponding printable ASCII character. The name of an alt-key combination (128–255) is a bytes object consisting of the prefix "b'M-'" followed by the name of the corresponding ASCII character. curses.killchar() Return the user’s current line kill character as a one-byte bytes object. Under Unix operating systems this is a property of the controlling tty of the curses program, and is not set by the curses library itself. curses.longname() Return a bytes object containing the terminfo long name field describing the current terminal. The maximum length of a verbose description is 128 characters. It is defined only after the call to "initscr()". curses.meta(flag) If *flag* is "True", allow 8-bit characters to be input. If *flag* is "False", allow only 7-bit chars. curses.mouseinterval(interval) Set the maximum time in milliseconds that can elapse between press and release events in order for them to be recognized as a click, and return the previous interval value. The default value is 200 milliseconds, or one fifth of a second. curses.mousemask(mousemask) Set the mouse events to be reported, and return a tuple "(availmask, oldmask)". *availmask* indicates which of the specified mouse events can be reported; on complete failure it returns "0". *oldmask* is the previous value of the given window’s mouse event mask. If this function is never called, no mouse events are ever reported. curses.napms(ms) Sleep for *ms* milliseconds. curses.newpad(nlines, ncols) Create and return a pointer to a new pad data structure with the given number of lines and columns. Return a pad as a window object. A pad is like a window, except that it is not restricted by the screen size, and is not necessarily associated with a particular part of the screen. Pads can be used when a large window is needed, and only a part of the window will be on the screen at one time. Automatic refreshes of pads (such as from scrolling or echoing of input) do not occur. The "refresh()" and "noutrefresh()" methods of a pad require 6 arguments to specify the part of the pad to be displayed and the location on the screen to be used for the display. The arguments are *pminrow*, *pmincol*, *sminrow*, *smincol*, *smaxrow*, *smaxcol*; the *p* arguments refer to the upper left corner of the pad region to be displayed and the *s* arguments define a clipping box on the screen within which the pad region is to be displayed. curses.newwin(nlines, ncols) curses.newwin(nlines, ncols, begin_y, begin_x) Return a new window, whose left-upper corner is at "(begin_y, begin_x)", and whose height/width is *nlines*/*ncols*. By default, the window will extend from the specified position to the lower right corner of the screen. curses.nl() Enter newline mode. This mode translates the return key into newline on input, and translates newline into return and line-feed on output. Newline mode is initially on. curses.nocbreak() Leave cbreak mode. Return to normal “cooked” mode with line buffering. curses.noecho() Leave echo mode. Echoing of input characters is turned off. curses.nonl() Leave newline mode. Disable translation of return into newline on input, and disable low-level translation of newline into newline/return on output (but this does not change the behavior of "addch('\n')", which always does the equivalent of return and line feed on the virtual screen). With translation off, curses can sometimes speed up vertical motion a little; also, it will be able to detect the return key on input. curses.noqiflush() When the "noqiflush()" routine is used, normal flush of input and output queues associated with the "INTR", "QUIT" and "SUSP" characters will not be done. You may want to call "noqiflush()" in a signal handler if you want output to continue as though the interrupt had not occurred, after the handler exits. curses.noraw() Leave raw mode. Return to normal “cooked” mode with line buffering. curses.pair_content(pair_number) Return a tuple "(fg, bg)" containing the colors for the requested color pair. The value of *pair_number* must be between "0" and "COLOR_PAIRS - 1". curses.pair_number(attr) Return the number of the color-pair set by the attribute value *attr*. "color_pair()" is the counterpart to this function. curses.putp(str) Equivalent to "tputs(str, 1, putchar)"; emit the value of a specified terminfo capability for the current terminal. Note that the output of "putp()" always goes to standard output. curses.qiflush([flag]) If *flag* is "False", the effect is the same as calling "noqiflush()". If *flag* is "True", or no argument is provided, the queues will be flushed when these control characters are read. curses.raw() Enter raw mode. In raw mode, normal line buffering and processing of interrupt, quit, suspend, and flow control keys are turned off; characters are presented to curses input functions one by one. curses.reset_prog_mode() Restore the terminal to “program” mode, as previously saved by "def_prog_mode()". curses.reset_shell_mode() Restore the terminal to “shell” mode, as previously saved by "def_shell_mode()". curses.resetty() Restore the state of the terminal modes to what it was at the last call to "savetty()". curses.resize_term(nlines, ncols) Backend function used by "resizeterm()", performing most of the work; when resizing the windows, "resize_term()" blank-fills the areas that are extended. The calling application should fill in these areas with appropriate data. The "resize_term()" function attempts to resize all windows. However, due to the calling convention of pads, it is not possible to resize these without additional interaction with the application. curses.resizeterm(nlines, ncols) Resize the standard and current windows to the specified dimensions, and adjusts other bookkeeping data used by the curses library that record the window dimensions (in particular the SIGWINCH handler). curses.savetty() Save the current state of the terminal modes in a buffer, usable by "resetty()". curses.get_escdelay() Retrieves the value set by "set_escdelay()". Added in version 3.9. curses.set_escdelay(ms) Sets the number of milliseconds to wait after reading an escape character, to distinguish between an individual escape character entered on the keyboard from escape sequences sent by cursor and function keys. Added in version 3.9. curses.get_tabsize() Retrieves the value set by "set_tabsize()". Added in version 3.9. curses.set_tabsize(size) Sets the number of columns used by the curses library when converting a tab character to spaces as it adds the tab to a window. Added in version 3.9. curses.setsyx(y, x) Set the virtual screen cursor to *y*, *x*. If *y* and *x* are both "-1", then "leaveok" is set "True". curses.setupterm(term=None, fd=-1) Initialize the terminal. *term* is a string giving the terminal name, or "None"; if omitted or "None", the value of the "TERM" environment variable will be used. *fd* is the file descriptor to which any initialization sequences will be sent; if not supplied or "-1", the file descriptor for "sys.stdout" will be used. curses.start_color() Must be called if the programmer wants to use colors, and before any other color manipulation routine is called. It is good practice to call this routine right after "initscr()". "start_color()" initializes eight basic colors (black, red, green, yellow, blue, magenta, cyan, and white), and two global variables in the "curses" module, "COLORS" and "COLOR_PAIRS", containing the maximum number of colors and color-pairs the terminal can support. It also restores the colors on the terminal to the values they had when the terminal was just turned on. curses.termattrs() Return a logical OR of all video attributes supported by the terminal. This information is useful when a curses program needs complete control over the appearance of the screen. curses.termname() Return the value of the environment variable "TERM", as a bytes object, truncated to 14 characters. curses.tigetflag(capname) Return the value of the Boolean capability corresponding to the terminfo capability name *capname* as an integer. Return the value "-1" if *capname* is not a Boolean capability, or "0" if it is canceled or absent from the terminal description. curses.tigetnum(capname) Return the value of the numeric capability corresponding to the terminfo capability name *capname* as an integer. Return the value "-2" if *capname* is not a numeric capability, or "-1" if it is canceled or absent from the terminal description. curses.tigetstr(capname) Return the value of the string capability corresponding to the terminfo capability name *capname* as a bytes object. Return "None" if *capname* is not a terminfo “string capability”, or is canceled or absent from the terminal description. curses.tparm(str[, ...]) Instantiate the bytes object *str* with the supplied parameters, where *str* should be a parameterized string obtained from the terminfo database. E.g. "tparm(tigetstr("cup"), 5, 3)" could result in "b'\033[6;4H'", the exact result depending on terminal type. curses.typeahead(fd) Specify that the file descriptor *fd* be used for typeahead checking. If *fd* is "-1", then no typeahead checking is done. The curses library does “line-breakout optimization” by looking for typeahead periodically while updating the screen. If input is found, and it is coming from a tty, the current update is postponed until refresh or doupdate is called again, allowing faster response to commands typed in advance. This function allows specifying a different file descriptor for typeahead checking. curses.unctrl(ch) Return a bytes object which is a printable representation of the character *ch*. Control characters are represented as a caret followed by the character, for example as "b'^C'". Printing characters are left as they are. curses.ungetch(ch) Push *ch* so the next "getch()" will return it. Note: Only one *ch* can be pushed before "getch()" is called. curses.update_lines_cols() Update the "LINES" and "COLS" module variables. Useful for detecting manual screen resize. Added in version 3.5. curses.unget_wch(ch) Push *ch* so the next "get_wch()" will return it. Note: Only one *ch* can be pushed before "get_wch()" is called. Added in version 3.3. curses.ungetmouse(id, x, y, z, bstate) Push a "KEY_MOUSE" event onto the input queue, associating the given state data with it. curses.use_env(flag) If used, this function should be called before "initscr()" or newterm are called. When *flag* is "False", the values of lines and columns specified in the terminfo database will be used, even if environment variables "LINES" and "COLUMNS" (used by default) are set, or if curses is running in a window (in which case default behavior would be to use the window size if "LINES" and "COLUMNS" are not set). curses.use_default_colors() Allow use of default values for colors on terminals supporting this feature. Use this to support transparency in your application. The default color is assigned to the color number "-1". After calling this function, "init_pair(x, curses.COLOR_RED, -1)" initializes, for instance, color pair *x* to a red foreground color on the default background. curses.wrapper(func, /, *args, **kwargs) Initialize curses and call another callable object, *func*, which should be the rest of your curses-using application. If the application raises an exception, this function will restore the terminal to a sane state before re-raising the exception and generating a traceback. The callable object *func* is then passed the main window ‘stdscr’ as its first argument, followed by any other arguments passed to "wrapper()". Before calling *func*, "wrapper()" turns on cbreak mode, turns off echo, enables the terminal keypad, and initializes colors if the terminal has color support. On exit (whether normally or by exception) it restores cooked mode, turns on echo, and disables the terminal keypad. Window Objects ============== Window objects, as returned by "initscr()" and "newwin()" above, have the following methods and attributes: window.addch(ch[, attr]) window.addch(y, x, ch[, attr]) Paint character *ch* at "(y, x)" with attributes *attr*, overwriting any character previously painted at that location. By default, the character position and attributes are the current settings for the window object. Note: Writing outside the window, subwindow, or pad raises a "curses.error". Attempting to write to the lower right corner of a window, subwindow, or pad will cause an exception to be raised after the character is printed. window.addnstr(str, n[, attr]) window.addnstr(y, x, str, n[, attr]) Paint at most *n* characters of the character string *str* at "(y, x)" with attributes *attr*, overwriting anything previously on the display. window.addstr(str[, attr]) window.addstr(y, x, str[, attr]) Paint the character string *str* at "(y, x)" with attributes *attr*, overwriting anything previously on the display. Note: * Writing outside the window, subwindow, or pad raises "curses.error". Attempting to write to the lower right corner of a window, subwindow, or pad will cause an exception to be raised after the string is printed. * A bug in ncurses, the backend for this Python module, can cause SegFaults when resizing windows. This is fixed in ncurses-6.1-20190511. If you are stuck with an earlier ncurses, you can avoid triggering this if you do not call "addstr()" with a *str* that has embedded newlines. Instead, call "addstr()" separately for each line. window.attroff(attr) Remove attribute *attr* from the “background” set applied to all writes to the current window. window.attron(attr) Add attribute *attr* from the “background” set applied to all writes to the current window. window.attrset(attr) Set the “background” set of attributes to *attr*. This set is initially "0" (no attributes). window.bkgd(ch[, attr]) Set the background property of the window to the character *ch*, with attributes *attr*. The change is then applied to every character position in that window: * The attribute of every character in the window is changed to the new background attribute. * Wherever the former background character appears, it is changed to the new background character. window.bkgdset(ch[, attr]) Set the window’s background. A window’s background consists of a character and any combination of attributes. The attribute part of the background is combined (OR’ed) with all non-blank characters that are written into the window. Both the character and attribute parts of the background are combined with the blank characters. The background becomes a property of the character and moves with the character through any scrolling and insert/delete line/character operations. window.border([ls[, rs[, ts[, bs[, tl[, tr[, bl[, br]]]]]]]]) Draw a border around the edges of the window. Each parameter specifies the character to use for a specific part of the border; see the table below for more details. Note: A "0" value for any parameter will cause the default character to be used for that parameter. Keyword parameters can *not* be used. The defaults are listed in this table: +-------------+-----------------------+-------------------------+ | Parameter | Description | Default value | |=============|=======================|=========================| | *ls* | Left side | "ACS_VLINE" | +-------------+-----------------------+-------------------------+ | *rs* | Right side | "ACS_VLINE" | +-------------+-----------------------+-------------------------+ | *ts* | Top | "ACS_HLINE" | +-------------+-----------------------+-------------------------+ | *bs* | Bottom | "ACS_HLINE" | +-------------+-----------------------+-------------------------+ | *tl* | Upper-left corner | "ACS_ULCORNER" | +-------------+-----------------------+-------------------------+ | *tr* | Upper-right corner | "ACS_URCORNER" | +-------------+-----------------------+-------------------------+ | *bl* | Bottom-left corner | "ACS_LLCORNER" | +-------------+-----------------------+-------------------------+ | *br* | Bottom-right corner | "ACS_LRCORNER" | +-------------+-----------------------+-------------------------+ window.box([vertch, horch]) Similar to "border()", but both *ls* and *rs* are *vertch* and both *ts* and *bs* are *horch*. The default corner characters are always used by this function. window.chgat(attr) window.chgat(num, attr) window.chgat(y, x, attr) window.chgat(y, x, num, attr) Set the attributes of *num* characters at the current cursor position, or at position "(y, x)" if supplied. If *num* is not given or is "-1", the attribute will be set on all the characters to the end of the line. This function moves cursor to position "(y, x)" if supplied. The changed line will be touched using the "touchline()" method so that the contents will be redisplayed by the next window refresh. window.clear() Like "erase()", but also cause the whole window to be repainted upon next call to "refresh()". window.clearok(flag) If *flag* is "True", the next call to "refresh()" will clear the window completely. window.clrtobot() Erase from cursor to the end of the window: all lines below the cursor are deleted, and then the equivalent of "clrtoeol()" is performed. window.clrtoeol() Erase from cursor to the end of the line. window.cursyncup() Update the current cursor position of all the ancestors of the window to reflect the current cursor position of the window. window.delch([y, x]) Delete any character at "(y, x)". window.deleteln() Delete the line under the cursor. All following lines are moved up by one line. window.derwin(begin_y, begin_x) window.derwin(nlines, ncols, begin_y, begin_x) An abbreviation for “derive window”, "derwin()" is the same as calling "subwin()", except that *begin_y* and *begin_x* are relative to the origin of the window, rather than relative to the entire screen. Return a window object for the derived window. window.echochar(ch[, attr]) Add character *ch* with attribute *attr*, and immediately call "refresh()" on the window. window.enclose(y, x) Test whether the given pair of screen-relative character-cell coordinates are enclosed by the given window, returning "True" or "False". It is useful for determining what subset of the screen windows enclose the location of a mouse event. Changed in version 3.10: Previously it returned "1" or "0" instead of "True" or "False". window.encoding Encoding used to encode method arguments (Unicode strings and characters). The encoding attribute is inherited from the parent window when a subwindow is created, for example with "window.subwin()". By default, current locale encoding is used (see "locale.getencoding()"). Added in version 3.3. window.erase() Clear the window. window.getbegyx() Return a tuple "(y, x)" of coordinates of upper-left corner. window.getbkgd() Return the given window’s current background character/attribute pair. window.getch([y, x]) Get a character. Note that the integer returned does *not* have to be in ASCII range: function keys, keypad keys and so on are represented by numbers higher than 255. In no-delay mode, return "-1" if there is no input, otherwise wait until a key is pressed. window.get_wch([y, x]) Get a wide character. Return a character for most keys, or an integer for function keys, keypad keys, and other special keys. In no-delay mode, raise an exception if there is no input. Added in version 3.3. window.getkey([y, x]) Get a character, returning a string instead of an integer, as "getch()" does. Function keys, keypad keys and other special keys return a multibyte string containing the key name. In no-delay mode, raise an exception if there is no input. window.getmaxyx() Return a tuple "(y, x)" of the height and width of the window. window.getparyx() Return the beginning coordinates of this window relative to its parent window as a tuple "(y, x)". Return "(-1, -1)" if this window has no parent. window.getstr() window.getstr(n) window.getstr(y, x) window.getstr(y, x, n) Read a bytes object from the user, with primitive line editing capacity. window.getyx() Return a tuple "(y, x)" of current cursor position relative to the window’s upper-left corner. window.hline(ch, n) window.hline(y, x, ch, n) Display a horizontal line starting at "(y, x)" with length *n* consisting of the character *ch*. window.idcok(flag) If *flag* is "False", curses no longer considers using the hardware insert/delete character feature of the terminal; if *flag* is "True", use of character insertion and deletion is enabled. When curses is first initialized, use of character insert/delete is enabled by default. window.idlok(flag) If *flag* is "True", "curses" will try and use hardware line editing facilities. Otherwise, line insertion/deletion are disabled. window.immedok(flag) If *flag* is "True", any change in the window image automatically causes the window to be refreshed; you no longer have to call "refresh()" yourself. However, it may degrade performance considerably, due to repeated calls to wrefresh. This option is disabled by default. window.inch([y, x]) Return the character at the given position in the window. The bottom 8 bits are the character proper, and upper bits are the attributes. window.insch(ch[, attr]) window.insch(y, x, ch[, attr]) Paint character *ch* at "(y, x)" with attributes *attr*, moving the line from position *x* right by one character. window.insdelln(nlines) Insert *nlines* lines into the specified window above the current line. The *nlines* bottom lines are lost. For negative *nlines*, delete *nlines* lines starting with the one under the cursor, and move the remaining lines up. The bottom *nlines* lines are cleared. The current cursor position remains the same. window.insertln() Insert a blank line under the cursor. All following lines are moved down by one line. window.insnstr(str, n[, attr]) window.insnstr(y, x, str, n[, attr]) Insert a character string (as many characters as will fit on the line) before the character under the cursor, up to *n* characters. If *n* is zero or negative, the entire string is inserted. All characters to the right of the cursor are shifted right, with the rightmost characters on the line being lost. The cursor position does not change (after moving to *y*, *x*, if specified). window.insstr(str[, attr]) window.insstr(y, x, str[, attr]) Insert a character string (as many characters as will fit on the line) before the character under the cursor. All characters to the right of the cursor are shifted right, with the rightmost characters on the line being lost. The cursor position does not change (after moving to *y*, *x*, if specified). window.instr([n]) window.instr(y, x[, n]) Return a bytes object of characters, extracted from the window starting at the current cursor position, or at *y*, *x* if specified. Attributes are stripped from the characters. If *n* is specified, "instr()" returns a string at most *n* characters long (exclusive of the trailing NUL). window.is_linetouched(line) Return "True" if the specified line was modified since the last call to "refresh()"; otherwise return "False". Raise a "curses.error" exception if *line* is not valid for the given window. window.is_wintouched() Return "True" if the specified window was modified since the last call to "refresh()"; otherwise return "False". window.keypad(flag) If *flag* is "True", escape sequences generated by some keys (keypad, function keys) will be interpreted by "curses". If *flag* is "False", escape sequences will be left as is in the input stream. window.leaveok(flag) If *flag* is "True", cursor is left where it is on update, instead of being at “cursor position.” This reduces cursor movement where possible. If possible the cursor will be made invisible. If *flag* is "False", cursor will always be at “cursor position” after an update. window.move(new_y, new_x) Move cursor to "(new_y, new_x)". window.mvderwin(y, x) Move the window inside its parent window. The screen-relative parameters of the window are not changed. This routine is used to display different parts of the parent window at the same physical position on the screen. window.mvwin(new_y, new_x) Move the window so its upper-left corner is at "(new_y, new_x)". window.nodelay(flag) If *flag* is "True", "getch()" will be non-blocking. window.notimeout(flag) If *flag* is "True", escape sequences will not be timed out. If *flag* is "False", after a few milliseconds, an escape sequence will not be interpreted, and will be left in the input stream as is. window.noutrefresh() Mark for refresh but wait. This function updates the data structure representing the desired state of the window, but does not force an update of the physical screen. To accomplish that, call "doupdate()". window.overlay(destwin[, sminrow, smincol, dminrow, dmincol, dmaxrow, dmaxcol]) Overlay the window on top of *destwin*. The windows need not be the same size, only the overlapping region is copied. This copy is non- destructive, which means that the current background character does not overwrite the old contents of *destwin*. To get fine-grained control over the copied region, the second form of "overlay()" can be used. *sminrow* and *smincol* are the upper- left coordinates of the source window, and the other variables mark a rectangle in the destination window. window.overwrite(destwin[, sminrow, smincol, dminrow, dmincol, dmaxrow, dmaxcol]) Overwrite the window on top of *destwin*. The windows need not be the same size, in which case only the overlapping region is copied. This copy is destructive, which means that the current background character overwrites the old contents of *destwin*. To get fine-grained control over the copied region, the second form of "overwrite()" can be used. *sminrow* and *smincol* are the upper-left coordinates of the source window, the other variables mark a rectangle in the destination window. window.putwin(file) Write all data associated with the window into the provided file object. This information can be later retrieved using the "getwin()" function. window.redrawln(beg, num) Indicate that the *num* screen lines, starting at line *beg*, are corrupted and should be completely redrawn on the next "refresh()" call. window.redrawwin() Touch the entire window, causing it to be completely redrawn on the next "refresh()" call. window.refresh([pminrow, pmincol, sminrow, smincol, smaxrow, smaxcol]) Update the display immediately (sync actual screen with previous drawing/deleting methods). The 6 optional arguments can only be specified when the window is a pad created with "newpad()". The additional parameters are needed to indicate what part of the pad and screen are involved. *pminrow* and *pmincol* specify the upper left-hand corner of the rectangle to be displayed in the pad. *sminrow*, *smincol*, *smaxrow*, and *smaxcol* specify the edges of the rectangle to be displayed on the screen. The lower right-hand corner of the rectangle to be displayed in the pad is calculated from the screen coordinates, since the rectangles must be the same size. Both rectangles must be entirely contained within their respective structures. Negative values of *pminrow*, *pmincol*, *sminrow*, or *smincol* are treated as if they were zero. window.resize(nlines, ncols) Reallocate storage for a curses window to adjust its dimensions to the specified values. If either dimension is larger than the current values, the window’s data is filled with blanks that have the current background rendition (as set by "bkgdset()") merged into them. window.scroll([lines=1]) Scroll the screen or scrolling region upward by *lines* lines. window.scrollok(flag) Control what happens when the cursor of a window is moved off the edge of the window or scrolling region, either as a result of a newline action on the bottom line, or typing the last character of the last line. If *flag* is "False", the cursor is left on the bottom line. If *flag* is "True", the window is scrolled up one line. Note that in order to get the physical scrolling effect on the terminal, it is also necessary to call "idlok()". window.setscrreg(top, bottom) Set the scrolling region from line *top* to line *bottom*. All scrolling actions will take place in this region. window.standend() Turn off the standout attribute. On some terminals this has the side effect of turning off all attributes. window.standout() Turn on attribute *A_STANDOUT*. window.subpad(begin_y, begin_x) window.subpad(nlines, ncols, begin_y, begin_x) Return a sub-window, whose upper-left corner is at "(begin_y, begin_x)", and whose width/height is *ncols*/*nlines*. window.subwin(begin_y, begin_x) window.subwin(nlines, ncols, begin_y, begin_x) Return a sub-window, whose upper-left corner is at "(begin_y, begin_x)", and whose width/height is *ncols*/*nlines*. By default, the sub-window will extend from the specified position to the lower right corner of the window. window.syncdown() Touch each location in the window that has been touched in any of its ancestor windows. This routine is called by "refresh()", so it should almost never be necessary to call it manually. window.syncok(flag) If *flag* is "True", then "syncup()" is called automatically whenever there is a change in the window. window.syncup() Touch all locations in ancestors of the window that have been changed in the window. window.timeout(delay) Set blocking or non-blocking read behavior for the window. If *delay* is negative, blocking read is used (which will wait indefinitely for input). If *delay* is zero, then non-blocking read is used, and "getch()" will return "-1" if no input is waiting. If *delay* is positive, then "getch()" will block for *delay* milliseconds, and return "-1" if there is still no input at the end of that time. window.touchline(start, count[, changed]) Pretend *count* lines have been changed, starting with line *start*. If *changed* is supplied, it specifies whether the affected lines are marked as having been changed (*changed*"=True") or unchanged (*changed*"=False"). window.touchwin() Pretend the whole window has been changed, for purposes of drawing optimizations. window.untouchwin() Mark all lines in the window as unchanged since the last call to "refresh()". window.vline(ch, n[, attr]) window.vline(y, x, ch, n[, attr]) Display a vertical line starting at "(y, x)" with length *n* consisting of the character *ch* with attributes *attr*. Constants ========= The "curses" module defines the following data members: curses.ERR Some curses routines that return an integer, such as "getch()", return "ERR" upon failure. curses.OK Some curses routines that return an integer, such as "napms()", return "OK" upon success. curses.version curses.__version__ A bytes object representing the current version of the module. curses.ncurses_version A named tuple containing the three components of the ncurses library version: *major*, *minor*, and *patch*. All values are integers. The components can also be accessed by name, so "curses.ncurses_version[0]" is equivalent to "curses.ncurses_version.major" and so on. Availability: if the ncurses library is used. Added in version 3.8. curses.COLORS The maximum number of colors the terminal can support. It is defined only after the call to "start_color()". curses.COLOR_PAIRS The maximum number of color pairs the terminal can support. It is defined only after the call to "start_color()". curses.COLS The width of the screen, i.e., the number of columns. It is defined only after the call to "initscr()". Updated by "update_lines_cols()", "resizeterm()" and "resize_term()". curses.LINES The height of the screen, i.e., the number of lines. It is defined only after the call to "initscr()". Updated by "update_lines_cols()", "resizeterm()" and "resize_term()". Some constants are available to specify character cell attributes. The exact constants available are system dependent. +--------------------------+---------------------------------+ | Attribute | Meaning | |==========================|=================================| | curses.A_ALTCHARSET | Alternate character set mode | +--------------------------+---------------------------------+ | curses.A_BLINK | Blink mode | +--------------------------+---------------------------------+ | curses.A_BOLD | Bold mode | +--------------------------+---------------------------------+ | curses.A_DIM | Dim mode | +--------------------------+---------------------------------+ | curses.A_INVIS | Invisible or blank mode | +--------------------------+---------------------------------+ | curses.A_ITALIC | Italic mode | +--------------------------+---------------------------------+ | curses.A_NORMAL | Normal attribute | +--------------------------+---------------------------------+ | curses.A_PROTECT | Protected mode | +--------------------------+---------------------------------+ | curses.A_REVERSE | Reverse background and | | | foreground colors | +--------------------------+---------------------------------+ | curses.A_STANDOUT | Standout mode | +--------------------------+---------------------------------+ | curses.A_UNDERLINE | Underline mode | +--------------------------+---------------------------------+ | curses.A_HORIZONTAL | Horizontal highlight | +--------------------------+---------------------------------+ | curses.A_LEFT | Left highlight | +--------------------------+---------------------------------+ | curses.A_LOW | Low highlight | +--------------------------+---------------------------------+ | curses.A_RIGHT | Right highlight | +--------------------------+---------------------------------+ | curses.A_TOP | Top highlight | +--------------------------+---------------------------------+ | curses.A_VERTICAL | Vertical highlight | +--------------------------+---------------------------------+ Added in version 3.7: "A_ITALIC" was added. Several constants are available to extract corresponding attributes returned by some methods. +---------------------------+---------------------------------+ | Bit-mask | Meaning | |===========================|=================================| | curses.A_ATTRIBUTES | Bit-mask to extract attributes | +---------------------------+---------------------------------+ | curses.A_CHARTEXT | Bit-mask to extract a character | +---------------------------+---------------------------------+ | curses.A_COLOR | Bit-mask to extract color-pair | | | field information | +---------------------------+---------------------------------+ Keys are referred to by integer constants with names starting with "KEY_". The exact keycaps available are system dependent. +---------------------------+----------------------------------------------+ | Key constant | Key | |===========================|==============================================| | curses.KEY_MIN | Minimum key value | +---------------------------+----------------------------------------------+ | curses.KEY_BREAK | Break key (unreliable) | +---------------------------+----------------------------------------------+ | curses.KEY_DOWN | Down-arrow | +---------------------------+----------------------------------------------+ | curses.KEY_UP | Up-arrow | +---------------------------+----------------------------------------------+ | curses.KEY_LEFT | Left-arrow | +---------------------------+----------------------------------------------+ | curses.KEY_RIGHT | Right-arrow | +---------------------------+----------------------------------------------+ | curses.KEY_HOME | Home key (upward+left arrow) | +---------------------------+----------------------------------------------+ | curses.KEY_BACKSPACE | Backspace (unreliable) | +---------------------------+----------------------------------------------+ | curses.KEY_F0 | Function keys. Up to 64 function keys are | | | supported. | +---------------------------+----------------------------------------------+ | curses.KEY_Fn | Value of function key *n* | +---------------------------+----------------------------------------------+ | curses.KEY_DL | Delete line | +---------------------------+----------------------------------------------+ | curses.KEY_IL | Insert line | +---------------------------+----------------------------------------------+ | curses.KEY_DC | Delete character | +---------------------------+----------------------------------------------+ | curses.KEY_IC | Insert char or enter insert mode | +---------------------------+----------------------------------------------+ | curses.KEY_EIC | Exit insert char mode | +---------------------------+----------------------------------------------+ | curses.KEY_CLEAR | Clear screen | +---------------------------+----------------------------------------------+ | curses.KEY_EOS | Clear to end of screen | +---------------------------+----------------------------------------------+ | curses.KEY_EOL | Clear to end of line | +---------------------------+----------------------------------------------+ | curses.KEY_SF | Scroll 1 line forward | +---------------------------+----------------------------------------------+ | curses.KEY_SR | Scroll 1 line backward (reverse) | +---------------------------+----------------------------------------------+ | curses.KEY_NPAGE | Next page | +---------------------------+----------------------------------------------+ | curses.KEY_PPAGE | Previous page | +---------------------------+----------------------------------------------+ | curses.KEY_STAB | Set tab | +---------------------------+----------------------------------------------+ | curses.KEY_CTAB | Clear tab | +---------------------------+----------------------------------------------+ | curses.KEY_CATAB | Clear all tabs | +---------------------------+----------------------------------------------+ | curses.KEY_ENTER | Enter or send (unreliable) | +---------------------------+----------------------------------------------+ | curses.KEY_SRESET | Soft (partial) reset (unreliable) | +---------------------------+----------------------------------------------+ | curses.KEY_RESET | Reset or hard reset (unreliable) | +---------------------------+----------------------------------------------+ | curses.KEY_PRINT | Print | +---------------------------+----------------------------------------------+ | curses.KEY_LL | Home down or bottom (lower left) | +---------------------------+----------------------------------------------+ | curses.KEY_A1 | Upper left of keypad | +---------------------------+----------------------------------------------+ | curses.KEY_A3 | Upper right of keypad | +---------------------------+----------------------------------------------+ | curses.KEY_B2 | Center of keypad | +---------------------------+----------------------------------------------+ | curses.KEY_C1 | Lower left of keypad | +---------------------------+----------------------------------------------+ | curses.KEY_C3 | Lower right of keypad | +---------------------------+----------------------------------------------+ | curses.KEY_BTAB | Back tab | +---------------------------+----------------------------------------------+ | curses.KEY_BEG | Beg (beginning) | +---------------------------+----------------------------------------------+ | curses.KEY_CANCEL | Cancel | +---------------------------+----------------------------------------------+ | curses.KEY_CLOSE | Close | +---------------------------+----------------------------------------------+ | curses.KEY_COMMAND | Cmd (command) | +---------------------------+----------------------------------------------+ | curses.KEY_COPY | Copy | +---------------------------+----------------------------------------------+ | curses.KEY_CREATE | Create | +---------------------------+----------------------------------------------+ | curses.KEY_END | End | +---------------------------+----------------------------------------------+ | curses.KEY_EXIT | Exit | +---------------------------+----------------------------------------------+ | curses.KEY_FIND | Find | +---------------------------+----------------------------------------------+ | curses.KEY_HELP | Help | +---------------------------+----------------------------------------------+ | curses.KEY_MARK | Mark | +---------------------------+----------------------------------------------+ | curses.KEY_MESSAGE | Message | +---------------------------+----------------------------------------------+ | curses.KEY_MOVE | Move | +---------------------------+----------------------------------------------+ | curses.KEY_NEXT | Next | +---------------------------+----------------------------------------------+ | curses.KEY_OPEN | Open | +---------------------------+----------------------------------------------+ | curses.KEY_OPTIONS | Options | +---------------------------+----------------------------------------------+ | curses.KEY_PREVIOUS | Prev (previous) | +---------------------------+----------------------------------------------+ | curses.KEY_REDO | Redo | +---------------------------+----------------------------------------------+ | curses.KEY_REFERENCE | Ref (reference) | +---------------------------+----------------------------------------------+ | curses.KEY_REFRESH | Refresh | +---------------------------+----------------------------------------------+ | curses.KEY_REPLACE | Replace | +---------------------------+----------------------------------------------+ | curses.KEY_RESTART | Restart | +---------------------------+----------------------------------------------+ | curses.KEY_RESUME | Resume | +---------------------------+----------------------------------------------+ | curses.KEY_SAVE | Save | +---------------------------+----------------------------------------------+ | curses.KEY_SBEG | Shifted Beg (beginning) | +---------------------------+----------------------------------------------+ | curses.KEY_SCANCEL | Shifted Cancel | +---------------------------+----------------------------------------------+ | curses.KEY_SCOMMAND | Shifted Command | +---------------------------+----------------------------------------------+ | curses.KEY_SCOPY | Shifted Copy | +---------------------------+----------------------------------------------+ | curses.KEY_SCREATE | Shifted Create | +---------------------------+----------------------------------------------+ | curses.KEY_SDC | Shifted Delete char | +---------------------------+----------------------------------------------+ | curses.KEY_SDL | Shifted Delete line | +---------------------------+----------------------------------------------+ | curses.KEY_SELECT | Select | +---------------------------+----------------------------------------------+ | curses.KEY_SEND | Shifted End | +---------------------------+----------------------------------------------+ | curses.KEY_SEOL | Shifted Clear line | +---------------------------+----------------------------------------------+ | curses.KEY_SEXIT | Shifted Exit | +---------------------------+----------------------------------------------+ | curses.KEY_SFIND | Shifted Find | +---------------------------+----------------------------------------------+ | curses.KEY_SHELP | Shifted Help | +---------------------------+----------------------------------------------+ | curses.KEY_SHOME | Shifted Home | +---------------------------+----------------------------------------------+ | curses.KEY_SIC | Shifted Input | +---------------------------+----------------------------------------------+ | curses.KEY_SLEFT | Shifted Left arrow | +---------------------------+----------------------------------------------+ | curses.KEY_SMESSAGE | Shifted Message | +---------------------------+----------------------------------------------+ | curses.KEY_SMOVE | Shifted Move | +---------------------------+----------------------------------------------+ | curses.KEY_SNEXT | Shifted Next | +---------------------------+----------------------------------------------+ | curses.KEY_SOPTIONS | Shifted Options | +---------------------------+----------------------------------------------+ | curses.KEY_SPREVIOUS | Shifted Prev | +---------------------------+----------------------------------------------+ | curses.KEY_SPRINT | Shifted Print | +---------------------------+----------------------------------------------+ | curses.KEY_SREDO | Shifted Redo | +---------------------------+----------------------------------------------+ | curses.KEY_SREPLACE | Shifted Replace | +---------------------------+----------------------------------------------+ | curses.KEY_SRIGHT | Shifted Right arrow | +---------------------------+----------------------------------------------+ | curses.KEY_SRSUME | Shifted Resume | +---------------------------+----------------------------------------------+ | curses.KEY_SSAVE | Shifted Save | +---------------------------+----------------------------------------------+ | curses.KEY_SSUSPEND | Shifted Suspend | +---------------------------+----------------------------------------------+ | curses.KEY_SUNDO | Shifted Undo | +---------------------------+----------------------------------------------+ | curses.KEY_SUSPEND | Suspend | +---------------------------+----------------------------------------------+ | curses.KEY_UNDO | Undo | +---------------------------+----------------------------------------------+ | curses.KEY_MOUSE | Mouse event has occurred | +---------------------------+----------------------------------------------+ | curses.KEY_RESIZE | Terminal resize event | +---------------------------+----------------------------------------------+ | curses.KEY_MAX | Maximum key value | +---------------------------+----------------------------------------------+ On VT100s and their software emulations, such as X terminal emulators, there are normally at least four function keys ("KEY_F1", "KEY_F2", "KEY_F3", "KEY_F4") available, and the arrow keys mapped to "KEY_UP", "KEY_DOWN", "KEY_LEFT" and "KEY_RIGHT" in the obvious way. If your machine has a PC keyboard, it is safe to expect arrow keys and twelve function keys (older PC keyboards may have only ten function keys); also, the following keypad mappings are standard: +--------------------+-------------+ | Keycap | Constant | |====================|=============| | "Insert" | KEY_IC | +--------------------+-------------+ | "Delete" | KEY_DC | +--------------------+-------------+ | "Home" | KEY_HOME | +--------------------+-------------+ | "End" | KEY_END | +--------------------+-------------+ | "Page Up" | KEY_PPAGE | +--------------------+-------------+ | "Page Down" | KEY_NPAGE | +--------------------+-------------+ The following table lists characters from the alternate character set. These are inherited from the VT100 terminal, and will generally be available on software emulations such as X terminals. When there is no graphic available, curses falls back on a crude printable ASCII approximation. Note: These are available only after "initscr()" has been called. +--------------------------+--------------------------------------------+ | ACS code | Meaning | |==========================|============================================| | curses.ACS_BBSS | alternate name for upper right corner | +--------------------------+--------------------------------------------+ | curses.ACS_BLOCK | solid square block | +--------------------------+--------------------------------------------+ | curses.ACS_BOARD | board of squares | +--------------------------+--------------------------------------------+ | curses.ACS_BSBS | alternate name for horizontal line | +--------------------------+--------------------------------------------+ | curses.ACS_BSSB | alternate name for upper left corner | +--------------------------+--------------------------------------------+ | curses.ACS_BSSS | alternate name for top tee | +--------------------------+--------------------------------------------+ | curses.ACS_BTEE | bottom tee | +--------------------------+--------------------------------------------+ | curses.ACS_BULLET | bullet | +--------------------------+--------------------------------------------+ | curses.ACS_CKBOARD | checker board (stipple) | +--------------------------+--------------------------------------------+ | curses.ACS_DARROW | arrow pointing down | +--------------------------+--------------------------------------------+ | curses.ACS_DEGREE | degree symbol | +--------------------------+--------------------------------------------+ | curses.ACS_DIAMOND | diamond | +--------------------------+--------------------------------------------+ | curses.ACS_GEQUAL | greater-than-or-equal-to | +--------------------------+--------------------------------------------+ | curses.ACS_HLINE | horizontal line | +--------------------------+--------------------------------------------+ | curses.ACS_LANTERN | lantern symbol | +--------------------------+--------------------------------------------+ | curses.ACS_LARROW | left arrow | +--------------------------+--------------------------------------------+ | curses.ACS_LEQUAL | less-than-or-equal-to | +--------------------------+--------------------------------------------+ | curses.ACS_LLCORNER | lower left-hand corner | +--------------------------+--------------------------------------------+ | curses.ACS_LRCORNER | lower right-hand corner | +--------------------------+--------------------------------------------+ | curses.ACS_LTEE | left tee | +--------------------------+--------------------------------------------+ | curses.ACS_NEQUAL | not-equal sign | +--------------------------+--------------------------------------------+ | curses.ACS_PI | letter pi | +--------------------------+--------------------------------------------+ | curses.ACS_PLMINUS | plus-or-minus sign | +--------------------------+--------------------------------------------+ | curses.ACS_PLUS | big plus sign | +--------------------------+--------------------------------------------+ | curses.ACS_RARROW | right arrow | +--------------------------+--------------------------------------------+ | curses.ACS_RTEE | right tee | +--------------------------+--------------------------------------------+ | curses.ACS_S1 | scan line 1 | +--------------------------+--------------------------------------------+ | curses.ACS_S3 | scan line 3 | +--------------------------+--------------------------------------------+ | curses.ACS_S7 | scan line 7 | +--------------------------+--------------------------------------------+ | curses.ACS_S9 | scan line 9 | +--------------------------+--------------------------------------------+ | curses.ACS_SBBS | alternate name for lower right corner | +--------------------------+--------------------------------------------+ | curses.ACS_SBSB | alternate name for vertical line | +--------------------------+--------------------------------------------+ | curses.ACS_SBSS | alternate name for right tee | +--------------------------+--------------------------------------------+ | curses.ACS_SSBB | alternate name for lower left corner | +--------------------------+--------------------------------------------+ | curses.ACS_SSBS | alternate name for bottom tee | +--------------------------+--------------------------------------------+ | curses.ACS_SSSB | alternate name for left tee | +--------------------------+--------------------------------------------+ | curses.ACS_SSSS | alternate name for crossover or big plus | +--------------------------+--------------------------------------------+ | curses.ACS_STERLING | pound sterling | +--------------------------+--------------------------------------------+ | curses.ACS_TTEE | top tee | +--------------------------+--------------------------------------------+ | curses.ACS_UARROW | up arrow | +--------------------------+--------------------------------------------+ | curses.ACS_ULCORNER | upper left corner | +--------------------------+--------------------------------------------+ | curses.ACS_URCORNER | upper right corner | +--------------------------+--------------------------------------------+ | curses.ACS_VLINE | vertical line | +--------------------------+--------------------------------------------+ The following table lists mouse button constants used by "getmouse()": +------------------------------------+-----------------------------------------------+ | Mouse button constant | Meaning | |====================================|===============================================| | curses.BUTTONn_PRESSED | Mouse button *n* pressed | +------------------------------------+-----------------------------------------------+ | curses.BUTTONn_RELEASED | Mouse button *n* released | +------------------------------------+-----------------------------------------------+ | curses.BUTTONn_CLICKED | Mouse button *n* clicked | +------------------------------------+-----------------------------------------------+ | curses.BUTTONn_DOUBLE_CLICKED | Mouse button *n* double clicked | +------------------------------------+-----------------------------------------------+ | curses.BUTTONn_TRIPLE_CLICKED | Mouse button *n* triple clicked | +------------------------------------+-----------------------------------------------+ | curses.BUTTON_SHIFT | Shift was down during button state change | +------------------------------------+-----------------------------------------------+ | curses.BUTTON_CTRL | Control was down during button state change | +------------------------------------+-----------------------------------------------+ | curses.BUTTON_ALT | Control was down during button state change | +------------------------------------+-----------------------------------------------+ Changed in version 3.10: The "BUTTON5_*" constants are now exposed if they are provided by the underlying curses library. The following table lists the predefined colors: +---------------------------+------------------------------+ | Constant | Color | |===========================|==============================| | curses.COLOR_BLACK | Black | +---------------------------+------------------------------+ | curses.COLOR_BLUE | Blue | +---------------------------+------------------------------+ | curses.COLOR_CYAN | Cyan (light greenish blue) | +---------------------------+------------------------------+ | curses.COLOR_GREEN | Green | +---------------------------+------------------------------+ | curses.COLOR_MAGENTA | Magenta (purplish red) | +---------------------------+------------------------------+ | curses.COLOR_RED | Red | +---------------------------+------------------------------+ | curses.COLOR_WHITE | White | +---------------------------+------------------------------+ | curses.COLOR_YELLOW | Yellow | +---------------------------+------------------------------+ "curses.textpad" — Text input widget for curses programs ******************************************************** The "curses.textpad" module provides a "Textbox" class that handles elementary text editing in a curses window, supporting a set of keybindings resembling those of Emacs (thus, also of Netscape Navigator, BBedit 6.x, FrameMaker, and many other programs). The module also provides a rectangle-drawing function useful for framing text boxes or for other purposes. The module "curses.textpad" defines the following function: curses.textpad.rectangle(win, uly, ulx, lry, lrx) Draw a rectangle. The first argument must be a window object; the remaining arguments are coordinates relative to that window. The second and third arguments are the y and x coordinates of the upper left hand corner of the rectangle to be drawn; the fourth and fifth arguments are the y and x coordinates of the lower right hand corner. The rectangle will be drawn using VT100/IBM PC forms characters on terminals that make this possible (including xterm and most other software terminal emulators). Otherwise it will be drawn with ASCII dashes, vertical bars, and plus signs. Textbox objects =============== You can instantiate a "Textbox" object as follows: class curses.textpad.Textbox(win) Return a textbox widget object. The *win* argument should be a curses window object in which the textbox is to be contained. The edit cursor of the textbox is initially located at the upper left hand corner of the containing window, with coordinates "(0, 0)". The instance’s "stripspaces" flag is initially on. "Textbox" objects have the following methods: edit([validator]) This is the entry point you will normally use. It accepts editing keystrokes until one of the termination keystrokes is entered. If *validator* is supplied, it must be a function. It will be called for each keystroke entered with the keystroke as a parameter; command dispatch is done on the result. This method returns the window contents as a string; whether blanks in the window are included is affected by the "stripspaces" attribute. do_command(ch) Process a single command keystroke. Here are the supported special keystrokes: +--------------------+---------------------------------------------+ | Keystroke | Action | |====================|=============================================| | "Control"-"A" | Go to left edge of window. | +--------------------+---------------------------------------------+ | "Control"-"B" | Cursor left, wrapping to previous line if | | | appropriate. | +--------------------+---------------------------------------------+ | "Control"-"D" | Delete character under cursor. | +--------------------+---------------------------------------------+ | "Control"-"E" | Go to right edge (stripspaces off) or end | | | of line (stripspaces on). | +--------------------+---------------------------------------------+ | "Control"-"F" | Cursor right, wrapping to next line when | | | appropriate. | +--------------------+---------------------------------------------+ | "Control"-"G" | Terminate, returning the window contents. | +--------------------+---------------------------------------------+ | "Control"-"H" | Delete character backward. | +--------------------+---------------------------------------------+ | "Control"-"J" | Terminate if the window is 1 line, | | | otherwise insert newline. | +--------------------+---------------------------------------------+ | "Control"-"K" | If line is blank, delete it, otherwise | | | clear to end of line. | +--------------------+---------------------------------------------+ | "Control"-"L" | Refresh screen. | +--------------------+---------------------------------------------+ | "Control"-"N" | Cursor down; move down one line. | +--------------------+---------------------------------------------+ | "Control"-"O" | Insert a blank line at cursor location. | +--------------------+---------------------------------------------+ | "Control"-"P" | Cursor up; move up one line. | +--------------------+---------------------------------------------+ Move operations do nothing if the cursor is at an edge where the movement is not possible. The following synonyms are supported where possible: +----------------------------------+--------------------+ | Constant | Keystroke | |==================================|====================| | "KEY_LEFT" | "Control"-"B" | +----------------------------------+--------------------+ | "KEY_RIGHT" | "Control"-"F" | +----------------------------------+--------------------+ | "KEY_UP" | "Control"-"P" | +----------------------------------+--------------------+ | "KEY_DOWN" | "Control"-"N" | +----------------------------------+--------------------+ | "KEY_BACKSPACE" | "Control"-"h" | +----------------------------------+--------------------+ All other keystrokes are treated as a command to insert the given character and move right (with line wrapping). gather() Return the window contents as a string; whether blanks in the window are included is affected by the "stripspaces" member. stripspaces This attribute is a flag which controls the interpretation of blanks in the window. When it is on, trailing blanks on each line are ignored; any cursor motion that would land the cursor on a trailing blank goes to the end of that line instead, and trailing blanks are stripped when the window contents are gathered. Custom Python Interpreters ************************** The modules described in this chapter allow writing interfaces similar to Python’s interactive interpreter. If you want a Python interpreter that supports some special feature in addition to the Python language, you should look at the "code" module. (The "codeop" module is lower- level, used to support compiling a possibly incomplete chunk of Python code.) The full list of modules described in this chapter is: * "code" — Interpreter base classes * Interactive Interpreter Objects * Interactive Console Objects * "codeop" — Compile Python code "dataclasses" — Data Classes **************************** **Source code:** Lib/dataclasses.py ====================================================================== This module provides a decorator and functions for automatically adding generated *special methods* such as "__init__()" and "__repr__()" to user-defined classes. It was originally described in **PEP 557**. The member variables to use in these generated methods are defined using **PEP 526** type annotations. For example, this code: from dataclasses import dataclass @dataclass class InventoryItem: """Class for keeping track of an item in inventory.""" name: str unit_price: float quantity_on_hand: int = 0 def total_cost(self) -> float: return self.unit_price * self.quantity_on_hand will add, among other things, a "__init__()" that looks like: def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0): self.name = name self.unit_price = unit_price self.quantity_on_hand = quantity_on_hand Note that this method is automatically added to the class: it is not directly specified in the "InventoryItem" definition shown above. Added in version 3.7. Module contents =============== @dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False) This function is a *decorator* that is used to add generated *special methods* to classes, as described below. The "@dataclass" decorator examines the class to find "field"s. A "field" is defined as a class variable that has a *type annotation*. With two exceptions described below, nothing in "@dataclass" examines the type specified in the variable annotation. The order of the fields in all of the generated methods is the order in which they appear in the class definition. The "@dataclass" decorator will add various “dunder” methods to the class, described below. If any of the added methods already exist in the class, the behavior depends on the parameter, as documented below. The decorator returns the same class that it is called on; no new class is created. If "@dataclass" is used just as a simple decorator with no parameters, it acts as if it has the default values documented in this signature. That is, these three uses of "@dataclass" are equivalent: @dataclass class C: ... @dataclass() class C: ... @dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False) class C: ... The parameters to "@dataclass" are: * *init*: If true (the default), a "__init__()" method will be generated. If the class already defines "__init__()", this parameter is ignored. * *repr*: If true (the default), a "__repr__()" method will be generated. The generated repr string will have the class name and the name and repr of each field, in the order they are defined in the class. Fields that are marked as being excluded from the repr are not included. For example: "InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)". If the class already defines "__repr__()", this parameter is ignored. * *eq*: If true (the default), an "__eq__()" method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If the class already defines "__eq__()", this parameter is ignored. * *order*: If true (the default is "False"), "__lt__()", "__le__()", "__gt__()", and "__ge__()" methods will be generated. These compare the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If *order* is true and *eq* is false, a "ValueError" is raised. If the class already defines any of "__lt__()", "__le__()", "__gt__()", or "__ge__()", then "TypeError" is raised. * *unsafe_hash*: If "False" (the default), a "__hash__()" method is generated according to how *eq* and *frozen* are set. "__hash__()" is used by built-in "hash()", and when objects are added to hashed collections such as dictionaries and sets. Having a "__hash__()" implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer’s intent, the existence and behavior of "__eq__()", and the values of the *eq* and *frozen* flags in the "@dataclass" decorator. By default, "@dataclass" will not implicitly add a "__hash__()" method unless it is safe to do so. Neither will it add or change an existing explicitly defined "__hash__()" method. Setting the class attribute "__hash__ = None" has a specific meaning to Python, as described in the "__hash__()" documentation. If "__hash__()" is not explicitly defined, or if it is set to "None", then "@dataclass" *may* add an implicit "__hash__()" method. Although not recommended, you can force "@dataclass" to create a "__hash__()" method with "unsafe_hash=True". This might be the case if your class is logically immutable but can still be mutated. This is a specialized use case and should be considered carefully. Here are the rules governing implicit creation of a "__hash__()" method. Note that you cannot both have an explicit "__hash__()" method in your dataclass and set "unsafe_hash=True"; this will result in a "TypeError". If *eq* and *frozen* are both true, by default "@dataclass" will generate a "__hash__()" method for you. If *eq* is true and *frozen* is false, "__hash__()" will be set to "None", marking it unhashable (which it is, since it is mutable). If *eq* is false, "__hash__()" will be left untouched meaning the "__hash__()" method of the superclass will be used (if the superclass is "object", this means it will fall back to id-based hashing). * *frozen*: If true (the default is "False"), assigning to fields will generate an exception. This emulates read-only frozen instances. If "__setattr__()" or "__delattr__()" is defined in the class, then "TypeError" is raised. See the discussion below. * *match_args*: If true (the default is "True"), the "__match_args__" tuple will be created from the list of non keyword-only parameters to the generated "__init__()" method (even if "__init__()" is not generated, see above). If false, or if "__match_args__" is already defined in the class, then "__match_args__" will not be generated. Added in version 3.10. * *kw_only*: If true (the default value is "False"), then all fields will be marked as keyword-only. If a field is marked as keyword-only, then the only effect is that the "__init__()" parameter generated from a keyword-only field must be specified with a keyword when "__init__()" is called. See the *parameter* glossary entry for details. Also see the "KW_ONLY" section. Keyword-only fields are not included in "__match_args__". Added in version 3.10. * *slots*: If true (the default is "False"), "__slots__" attribute will be generated and new class will be returned instead of the original one. If "__slots__" is already defined in the class, then "TypeError" is raised. Warning: Calling no-arg "super()" in dataclasses using "slots=True" will result in the following exception being raised: "TypeError: super(type, obj): obj must be an instance or subtype of type". The two-arg "super()" is a valid workaround. See gh-90562 for full details. Warning: Passing parameters to a base class "__init_subclass__()" when using "slots=True" will result in a "TypeError". Either use "__init_subclass__" with no parameters or use default values as a workaround. See gh-91126 for full details. Added in version 3.10. Changed in version 3.11: If a field name is already included in the "__slots__" of a base class, it will not be included in the generated "__slots__" to prevent overriding them. Therefore, do not use "__slots__" to retrieve the field names of a dataclass. Use "fields()" instead. To be able to determine inherited slots, base class "__slots__" may be any iterable, but *not* an iterator. * *weakref_slot*: If true (the default is "False"), add a slot named “__weakref__”, which is required to make an instance "weakref-able". It is an error to specify "weakref_slot=True" without also specifying "slots=True". Added in version 3.11. "field"s may optionally specify a default value, using normal Python syntax: @dataclass class C: a: int # 'a' has no default value b: int = 0 # assign a default value for 'b' In this example, both "a" and "b" will be included in the added "__init__()" method, which will be defined as: def __init__(self, a: int, b: int = 0): "TypeError" will be raised if a field without a default value follows a field with a default value. This is true whether this occurs in a single class, or as a result of class inheritance. dataclasses.field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING) For common and simple use cases, no other functionality is required. There are, however, some dataclass features that require additional per-field information. To satisfy this need for additional information, you can replace the default field value with a call to the provided "field()" function. For example: @dataclass class C: mylist: list[int] = field(default_factory=list) c = C() c.mylist += [1, 2, 3] As shown above, the "MISSING" value is a sentinel object used to detect if some parameters are provided by the user. This sentinel is used because "None" is a valid value for some parameters with a distinct meaning. No code should directly use the "MISSING" value. The parameters to "field()" are: * *default*: If provided, this will be the default value for this field. This is needed because the "field()" call itself replaces the normal position of the default value. * *default_factory*: If provided, it must be a zero-argument callable that will be called when a default value is needed for this field. Among other purposes, this can be used to specify fields with mutable default values, as discussed below. It is an error to specify both *default* and *default_factory*. * *init*: If true (the default), this field is included as a parameter to the generated "__init__()" method. * *repr*: If true (the default), this field is included in the string returned by the generated "__repr__()" method. * *hash*: This can be a bool or "None". If true, this field is included in the generated "__hash__()" method. If false, this field is excluded from the generated "__hash__()". If "None" (the default), use the value of *compare*: this would normally be the expected behavior, since a field should be included in the hash if it’s used for comparisons. Setting this value to anything other than "None" is discouraged. One possible reason to set "hash=False" but "compare=True" would be if a field is expensive to compute a hash value for, that field is needed for equality testing, and there are other fields that contribute to the type’s hash value. Even if a field is excluded from the hash, it will still be used for comparisons. * *compare*: If true (the default), this field is included in the generated equality and comparison methods ("__eq__()", "__gt__()", et al.). * *metadata*: This can be a mapping or "None". "None" is treated as an empty dict. This value is wrapped in "MappingProxyType()" to make it read-only, and exposed on the "Field" object. It is not used at all by Data Classes, and is provided as a third-party extension mechanism. Multiple third-parties can each have their own key, to use as a namespace in the metadata. * *kw_only*: If true, this field will be marked as keyword-only. This is used when the generated "__init__()" method’s parameters are computed. Keyword-only fields are also not included in "__match_args__". Added in version 3.10. If the default value of a field is specified by a call to "field()", then the class attribute for this field will be replaced by the specified *default* value. If *default* is not provided, then the class attribute will be deleted. The intent is that after the "@dataclass" decorator runs, the class attributes will all contain the default values for the fields, just as if the default value itself were specified. For example, after: @dataclass class C: x: int y: int = field(repr=False) z: int = field(repr=False, default=10) t: int = 20 The class attribute "C.z" will be "10", the class attribute "C.t" will be "20", and the class attributes "C.x" and "C.y" will not be set. class dataclasses.Field "Field" objects describe each defined field. These objects are created internally, and are returned by the "fields()" module-level method (see below). Users should never instantiate a "Field" object directly. Its documented attributes are: * "name": The name of the field. * "type": The type of the field. * "default", "default_factory", "init", "repr", "hash", "compare", "metadata", and "kw_only" have the identical meaning and values as they do in the "field()" function. Other attributes may exist, but they are private and must not be inspected or relied on. class dataclasses.InitVar "InitVar[T]" type annotations describe variables that are init- only. Fields annotated with "InitVar" are considered pseudo-fields, and thus are neither returned by the "fields()" function nor used in any way except adding them as parameters to "__init__()" and an optional "__post_init__()". dataclasses.fields(class_or_instance) Returns a tuple of "Field" objects that define the fields for this dataclass. Accepts either a dataclass, or an instance of a dataclass. Raises "TypeError" if not passed a dataclass or instance of one. Does not return pseudo-fields which are "ClassVar" or "InitVar". dataclasses.asdict(obj, *, dict_factory=dict) Converts the dataclass *obj* to a dict (by using the factory function *dict_factory*). Each dataclass is converted to a dict of its fields, as "name: value" pairs. dataclasses, dicts, lists, and tuples are recursed into. Other objects are copied with "copy.deepcopy()". Example of using "asdict()" on nested dataclasses: @dataclass class Point: x: int y: int @dataclass class C: mylist: list[Point] p = Point(10, 20) assert asdict(p) == {'x': 10, 'y': 20} c = C([Point(0, 0), Point(10, 4)]) assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]} To create a shallow copy, the following workaround may be used: {field.name: getattr(obj, field.name) for field in fields(obj)} "asdict()" raises "TypeError" if *obj* is not a dataclass instance. dataclasses.astuple(obj, *, tuple_factory=tuple) Converts the dataclass *obj* to a tuple (by using the factory function *tuple_factory*). Each dataclass is converted to a tuple of its field values. dataclasses, dicts, lists, and tuples are recursed into. Other objects are copied with "copy.deepcopy()". Continuing from the previous example: assert astuple(p) == (10, 20) assert astuple(c) == ([(0, 0), (10, 4)],) To create a shallow copy, the following workaround may be used: tuple(getattr(obj, field.name) for field in dataclasses.fields(obj)) "astuple()" raises "TypeError" if *obj* is not a dataclass instance. dataclasses.make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False, module=None) Creates a new dataclass with name *cls_name*, fields as defined in *fields*, base classes as given in *bases*, and initialized with a namespace as given in *namespace*. *fields* is an iterable whose elements are each either "name", "(name, type)", or "(name, type, Field)". If just "name" is supplied, "typing.Any" is used for "type". The values of *init*, *repr*, *eq*, *order*, *unsafe_hash*, *frozen*, *match_args*, *kw_only*, *slots*, and *weakref_slot* have the same meaning as they do in "@dataclass". If *module* is defined, the "__module__" attribute of the dataclass is set to that value. By default, it is set to the module name of the caller. This function is not strictly required, because any Python mechanism for creating a new class with "__annotations__" can then apply the "@dataclass" function to convert that class to a dataclass. This function is provided as a convenience. For example: C = make_dataclass('C', [('x', int), 'y', ('z', int, field(default=5))], namespace={'add_one': lambda self: self.x + 1}) Is equivalent to: @dataclass class C: x: int y: 'typing.Any' z: int = 5 def add_one(self): return self.x + 1 dataclasses.replace(obj, /, **changes) Creates a new object of the same type as *obj*, replacing fields with values from *changes*. If *obj* is not a Data Class, raises "TypeError". If keys in *changes* are not field names of the given dataclass, raises "TypeError". The newly returned object is created by calling the "__init__()" method of the dataclass. This ensures that "__post_init__()", if present, is also called. Init-only variables without default values, if any exist, must be specified on the call to "replace()" so that they can be passed to "__init__()" and "__post_init__()". It is an error for *changes* to contain any fields that are defined as having "init=False". A "ValueError" will be raised in this case. Be forewarned about how "init=False" fields work during a call to "replace()". They are not copied from the source object, but rather are initialized in "__post_init__()", if they’re initialized at all. It is expected that "init=False" fields will be rarely and judiciously used. If they are used, it might be wise to have alternate class constructors, or perhaps a custom "replace()" (or similarly named) method which handles instance copying. Dataclass instances are also supported by generic function "copy.replace()". dataclasses.is_dataclass(obj) Return "True" if its parameter is a dataclass (including subclasses of a dataclass) or an instance of one, otherwise return "False". If you need to know if a class is an instance of a dataclass (and not a dataclass itself), then add a further check for "not isinstance(obj, type)": def is_dataclass_instance(obj): return is_dataclass(obj) and not isinstance(obj, type) dataclasses.MISSING A sentinel value signifying a missing default or default_factory. dataclasses.KW_ONLY A sentinel value used as a type annotation. Any fields after a pseudo-field with the type of "KW_ONLY" are marked as keyword-only fields. Note that a pseudo-field of type "KW_ONLY" is otherwise completely ignored. This includes the name of such a field. By convention, a name of "_" is used for a "KW_ONLY" field. Keyword- only fields signify "__init__()" parameters that must be specified as keywords when the class is instantiated. In this example, the fields "y" and "z" will be marked as keyword- only fields: @dataclass class Point: x: float _: KW_ONLY y: float z: float p = Point(0, y=1.5, z=2.0) In a single dataclass, it is an error to specify more than one field whose type is "KW_ONLY". Added in version 3.10. exception dataclasses.FrozenInstanceError Raised when an implicitly defined "__setattr__()" or "__delattr__()" is called on a dataclass which was defined with "frozen=True". It is a subclass of "AttributeError". Post-init processing ==================== dataclasses.__post_init__() When defined on the class, it will be called by the generated "__init__()", normally as "self.__post_init__()". However, if any "InitVar" fields are defined, they will also be passed to "__post_init__()" in the order they were defined in the class. If no "__init__()" method is generated, then "__post_init__()" will not automatically be called. Among other uses, this allows for initializing field values that depend on one or more other fields. For example: @dataclass class C: a: float b: float c: float = field(init=False) def __post_init__(self): self.c = self.a + self.b The "__init__()" method generated by "@dataclass" does not call base class "__init__()" methods. If the base class has an "__init__()" method that has to be called, it is common to call this method in a "__post_init__()" method: class Rectangle: def __init__(self, height, width): self.height = height self.width = width @dataclass class Square(Rectangle): side: float def __post_init__(self): super().__init__(self.side, self.side) Note, however, that in general the dataclass-generated "__init__()" methods don’t need to be called, since the derived dataclass will take care of initializing all fields of any base class that is a dataclass itself. See the section below on init-only variables for ways to pass parameters to "__post_init__()". Also see the warning about how "replace()" handles "init=False" fields. Class variables =============== One of the few places where "@dataclass" actually inspects the type of a field is to determine if a field is a class variable as defined in **PEP 526**. It does this by checking if the type of the field is "typing.ClassVar". If a field is a "ClassVar", it is excluded from consideration as a field and is ignored by the dataclass mechanisms. Such "ClassVar" pseudo-fields are not returned by the module-level "fields()" function. Init-only variables =================== Another place where "@dataclass" inspects a type annotation is to determine if a field is an init-only variable. It does this by seeing if the type of a field is of type "InitVar". If a field is an "InitVar", it is considered a pseudo-field called an init-only field. As it is not a true field, it is not returned by the module-level "fields()" function. Init-only fields are added as parameters to the generated "__init__()" method, and are passed to the optional "__post_init__()" method. They are not otherwise used by dataclasses. For example, suppose a field will be initialized from a database, if a value is not provided when creating the class: @dataclass class C: i: int j: int | None = None database: InitVar[DatabaseType | None] = None def __post_init__(self, database): if self.j is None and database is not None: self.j = database.lookup('j') c = C(10, database=my_database) In this case, "fields()" will return "Field" objects for "i" and "j", but not for "database". Frozen instances ================ It is not possible to create truly immutable Python objects. However, by passing "frozen=True" to the "@dataclass" decorator you can emulate immutability. In that case, dataclasses will add "__setattr__()" and "__delattr__()" methods to the class. These methods will raise a "FrozenInstanceError" when invoked. There is a tiny performance penalty when using "frozen=True": "__init__()" cannot use simple assignment to initialize fields, and must use "object.__setattr__()". Inheritance =========== When the dataclass is being created by the "@dataclass" decorator, it looks through all of the class’s base classes in reverse MRO (that is, starting at "object") and, for each dataclass that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes. An example: @dataclass class Base: x: Any = 15.0 y: int = 0 @dataclass class C(Base): z: int = 10 x: int = 15 The final list of fields is, in order, "x", "y", "z". The final type of "x" is "int", as specified in class "C". The generated "__init__()" method for "C" will look like: def __init__(self, x: int = 15, y: int = 0, z: int = 10): Re-ordering of keyword-only parameters in "__init__()" ====================================================== After the parameters needed for "__init__()" are computed, any keyword-only parameters are moved to come after all regular (non- keyword-only) parameters. This is a requirement of how keyword-only parameters are implemented in Python: they must come after non- keyword-only parameters. In this example, "Base.y", "Base.w", and "D.t" are keyword-only fields, and "Base.x" and "D.z" are regular fields: @dataclass class Base: x: Any = 15.0 _: KW_ONLY y: int = 0 w: int = 1 @dataclass class D(Base): z: int = 10 t: int = field(kw_only=True, default=0) The generated "__init__()" method for "D" will look like: def __init__(self, x: Any = 15.0, z: int = 10, *, y: int = 0, w: int = 1, t: int = 0): Note that the parameters have been re-ordered from how they appear in the list of fields: parameters derived from regular fields are followed by parameters derived from keyword-only fields. The relative ordering of keyword-only parameters is maintained in the re-ordered "__init__()" parameter list. Default factory functions ========================= If a "field()" specifies a *default_factory*, it is called with zero arguments when a default value for the field is needed. For example, to create a new instance of a list, use: mylist: list = field(default_factory=list) If a field is excluded from "__init__()" (using "init=False") and the field also specifies *default_factory*, then the default factory function will always be called from the generated "__init__()" function. This happens because there is no other way to give the field an initial value. Mutable default values ====================== Python stores default member variable values in class attributes. Consider this example, not using dataclasses: class C: x = [] def add(self, element): self.x.append(element) o1 = C() o2 = C() o1.add(1) o2.add(2) assert o1.x == [1, 2] assert o1.x is o2.x Note that the two instances of class "C" share the same class variable "x", as expected. Using dataclasses, *if* this code was valid: @dataclass class D: x: list = [] # This code raises ValueError def add(self, element): self.x.append(element) it would generate code similar to: class D: x = [] def __init__(self, x=x): self.x = x def add(self, element): self.x.append(element) assert D().x is D().x This has the same issue as the original example using class "C". That is, two instances of class "D" that do not specify a value for "x" when creating a class instance will share the same copy of "x". Because dataclasses just use normal Python class creation they also share this behavior. There is no general way for Data Classes to detect this condition. Instead, the "@dataclass" decorator will raise a "ValueError" if it detects an unhashable default parameter. The assumption is that if a value is unhashable, it is mutable. This is a partial solution, but it does protect against many common errors. Using default factory functions is a way to create new instances of mutable types as default values for fields: @dataclass class D: x: list = field(default_factory=list) assert D().x is not D().x Changed in version 3.11: Instead of looking for and disallowing objects of type "list", "dict", or "set", unhashable objects are now not allowed as default values. Unhashability is used to approximate mutability. Descriptor-typed fields ======================= Fields that are assigned descriptor objects as their default value have the following special behaviors: * The value for the field passed to the dataclass’s "__init__()" method is passed to the descriptor’s "__set__()" method rather than overwriting the descriptor object. * Similarly, when getting or setting the field, the descriptor’s "__get__()" or "__set__()" method is called rather than returning or overwriting the descriptor object. * To determine whether a field contains a default value, "@dataclass" will call the descriptor’s "__get__()" method using its class access form: "descriptor.__get__(obj=None, type=cls)". If the descriptor returns a value in this case, it will be used as the field’s default. On the other hand, if the descriptor raises "AttributeError" in this situation, no default value will be provided for the field. class IntConversionDescriptor: def __init__(self, *, default): self._default = default def __set_name__(self, owner, name): self._name = "_" + name def __get__(self, obj, type): if obj is None: return self._default return getattr(obj, self._name, self._default) def __set__(self, obj, value): setattr(obj, self._name, int(value)) @dataclass class InventoryItem: quantity_on_hand: IntConversionDescriptor = IntConversionDescriptor(default=100) i = InventoryItem() print(i.quantity_on_hand) # 100 i.quantity_on_hand = 2.5 # calls __set__ with 2.5 print(i.quantity_on_hand) # 2 Note that if a field is annotated with a descriptor type, but is not assigned a descriptor object as its default value, the field will act like a normal field. Data Types ********** The modules described in this chapter provide a variety of specialized data types such as dates and times, fixed-type arrays, heap queues, double-ended queues, and enumerations. Python also provides some built-in data types, in particular, "dict", "list", "set" and "frozenset", and "tuple". The "str" class is used to hold Unicode strings, and the "bytes" and "bytearray" classes are used to hold binary data. The following modules are documented in this chapter: * "datetime" — Basic date and time types * Aware and Naive Objects * Constants * Available Types * Common Properties * Determining if an Object is Aware or Naive * "timedelta" Objects * Examples of usage: "timedelta" * "date" Objects * Examples of Usage: "date" * "datetime" Objects * Examples of Usage: "datetime" * "time" Objects * Examples of Usage: "time" * "tzinfo" Objects * "timezone" Objects * "strftime()" and "strptime()" Behavior * "strftime()" and "strptime()" Format Codes * Technical Detail * "zoneinfo" — IANA time zone support * Using "ZoneInfo" * Data sources * Configuring the data sources * Compile-time configuration * Environment configuration * Runtime configuration * The "ZoneInfo" class * String representations * Pickle serialization * Functions * Globals * Exceptions and warnings * "calendar" — General calendar-related functions * Command-Line Usage * "collections" — Container datatypes * "ChainMap" objects * "ChainMap" Examples and Recipes * "Counter" objects * "deque" objects * "deque" Recipes * "defaultdict" objects * "defaultdict" Examples * "namedtuple()" Factory Function for Tuples with Named Fields * "OrderedDict" objects * "OrderedDict" Examples and Recipes * "UserDict" objects * "UserList" objects * "UserString" objects * "collections.abc" — Abstract Base Classes for Containers * Collections Abstract Base Classes * Collections Abstract Base Classes – Detailed Descriptions * Examples and Recipes * "heapq" — Heap queue algorithm * Basic Examples * Priority Queue Implementation Notes * Theory * "bisect" — Array bisection algorithm * Performance Notes * Searching Sorted Lists * Examples * "array" — Efficient arrays of numeric values * "weakref" — Weak references * Weak Reference Objects * Example * Finalizer Objects * Comparing finalizers with "__del__()" methods * "types" — Dynamic type creation and names for built-in types * Dynamic Type Creation * Standard Interpreter Types * Additional Utility Classes and Functions * Coroutine Utility Functions * "copy" — Shallow and deep copy operations * "pprint" — Data pretty printer * Functions * PrettyPrinter Objects * Example * "reprlib" — Alternate "repr()" implementation * Repr Objects * Subclassing Repr Objects * "enum" — Support for enumerations * Module Contents * Data Types * Supported "__dunder__" names * Supported "_sunder_" names * Utilities and Decorators * Notes * "graphlib" — Functionality to operate with graph-like structures * Exceptions "datetime" — Basic date and time types ************************************** **Source code:** Lib/datetime.py ====================================================================== The "datetime" module supplies classes for manipulating dates and times. While date and time arithmetic is supported, the focus of the implementation is on efficient attribute extraction for output formatting and manipulation. Tip: Skip to the format codes. See also: Module "calendar" General calendar related functions. Module "time" Time access and conversions. Module "zoneinfo" Concrete time zones representing the IANA time zone database. Package dateutil Third-party library with expanded time zone and parsing support. Package DateType Third-party library that introduces distinct static types to e.g. allow *static type checkers* to differentiate between naive and aware datetimes. Aware and Naive Objects ======================= Date and time objects may be categorized as “aware” or “naive” depending on whether or not they include time zone information. With sufficient knowledge of applicable algorithmic and political time adjustments, such as time zone and daylight saving time information, an **aware** object can locate itself relative to other aware objects. An aware object represents a specific moment in time that is not open to interpretation. [1] A **naive** object does not contain enough information to unambiguously locate itself relative to other date/time objects. Whether a naive object represents Coordinated Universal Time (UTC), local time, or time in some other time zone is purely up to the program, just like it is up to the program whether a particular number represents metres, miles, or mass. Naive objects are easy to understand and to work with, at the cost of ignoring some aspects of reality. For applications requiring aware objects, "datetime" and "time" objects have an optional time zone information attribute, "tzinfo", that can be set to an instance of a subclass of the abstract "tzinfo" class. These "tzinfo" objects capture information about the offset from UTC time, the time zone name, and whether daylight saving time is in effect. Only one concrete "tzinfo" class, the "timezone" class, is supplied by the "datetime" module. The "timezone" class can represent simple time zones with fixed offsets from UTC, such as UTC itself or North American EST and EDT time zones. Supporting time zones at deeper levels of detail is up to the application. The rules for time adjustment across the world are more political than rational, change frequently, and there is no standard suitable for every application aside from UTC. Constants ========= The "datetime" module exports the following constants: datetime.MINYEAR The smallest year number allowed in a "date" or "datetime" object. "MINYEAR" is 1. datetime.MAXYEAR The largest year number allowed in a "date" or "datetime" object. "MAXYEAR" is 9999. datetime.UTC Alias for the UTC time zone singleton "datetime.timezone.utc". Added in version 3.11. Available Types =============== class datetime.date An idealized naive date, assuming the current Gregorian calendar always was, and always will be, in effect. Attributes: "year", "month", and "day". class datetime.time An idealized time, independent of any particular day, assuming that every day has exactly 24*60*60 seconds. (There is no notion of “leap seconds” here.) Attributes: "hour", "minute", "second", "microsecond", and "tzinfo". class datetime.datetime A combination of a date and a time. Attributes: "year", "month", "day", "hour", "minute", "second", "microsecond", and "tzinfo". class datetime.timedelta A duration expressing the difference between two "datetime" or "date" instances to microsecond resolution. class datetime.tzinfo An abstract base class for time zone information objects. These are used by the "datetime" and "time" classes to provide a customizable notion of time adjustment (for example, to account for time zone and/or daylight saving time). class datetime.timezone A class that implements the "tzinfo" abstract base class as a fixed offset from the UTC. Added in version 3.2. Objects of these types are immutable. Subclass relationships: object timedelta tzinfo timezone time date datetime Common Properties ----------------- The "date", "datetime", "time", and "timezone" types share these common features: * Objects of these types are immutable. * Objects of these types are *hashable*, meaning that they can be used as dictionary keys. * Objects of these types support efficient pickling via the "pickle" module. Determining if an Object is Aware or Naive ------------------------------------------ Objects of the "date" type are always naive. An object of type "time" or "datetime" may be aware or naive. A "datetime" object "d" is aware if both of the following hold: 1. "d.tzinfo" is not "None" 2. "d.tzinfo.utcoffset(d)" does not return "None" Otherwise, "d" is naive. A "time" object "t" is aware if both of the following hold: 1. "t.tzinfo" is not "None" 2. "t.tzinfo.utcoffset(None)" does not return "None". Otherwise, "t" is naive. The distinction between aware and naive doesn’t apply to "timedelta" objects. "timedelta" Objects =================== A "timedelta" object represents a duration, the difference between two "datetime" or "date" instances. class datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0) All arguments are optional and default to 0. Arguments may be integers or floats, and may be positive or negative. Only *days*, *seconds* and *microseconds* are stored internally. Arguments are converted to those units: * A millisecond is converted to 1000 microseconds. * A minute is converted to 60 seconds. * An hour is converted to 3600 seconds. * A week is converted to 7 days. and days, seconds and microseconds are then normalized so that the representation is unique, with * "0 <= microseconds < 1000000" * "0 <= seconds < 3600*24" (the number of seconds in one day) * "-999999999 <= days <= 999999999" The following example illustrates how any arguments besides *days*, *seconds* and *microseconds* are “merged” and normalized into those three resulting attributes: >>> from datetime import timedelta >>> delta = timedelta( ... days=50, ... seconds=27, ... microseconds=10, ... milliseconds=29000, ... minutes=5, ... hours=8, ... weeks=2 ... ) >>> # Only days, seconds, and microseconds remain >>> delta datetime.timedelta(days=64, seconds=29156, microseconds=10) If any argument is a float and there are fractional microseconds, the fractional microseconds left over from all arguments are combined and their sum is rounded to the nearest microsecond using round-half-to-even tiebreaker. If no argument is a float, the conversion and normalization processes are exact (no information is lost). If the normalized value of days lies outside the indicated range, "OverflowError" is raised. Note that normalization of negative values may be surprising at first. For example: >>> from datetime import timedelta >>> d = timedelta(microseconds=-1) >>> (d.days, d.seconds, d.microseconds) (-1, 86399, 999999) Since the string representation of "timedelta" objects can be confusing, use the following recipe to produce a more readable format: >>> def pretty_timedelta(td): ... if td.days >= 0: ... return str(td) ... return f'-({-td!s})' ... >>> d = timedelta(hours=-1) >>> str(d) # not human-friendly '-1 day, 23:00:00' >>> pretty_timedelta(d) '-(1:00:00)' Class attributes: timedelta.min The most negative "timedelta" object, "timedelta(-999999999)". timedelta.max The most positive "timedelta" object, "timedelta(days=999999999, hours=23, minutes=59, seconds=59, microseconds=999999)". timedelta.resolution The smallest possible difference between non-equal "timedelta" objects, "timedelta(microseconds=1)". Note that, because of normalization, "timedelta.max" is greater than "-timedelta.min". "-timedelta.max" is not representable as a "timedelta" object. Instance attributes (read-only): timedelta.days Between -999,999,999 and 999,999,999 inclusive. timedelta.seconds Between 0 and 86,399 inclusive. Caution: It is a somewhat common bug for code to unintentionally use this attribute when it is actually intended to get a "total_seconds()" value instead: >>> from datetime import timedelta >>> duration = timedelta(seconds=11235813) >>> duration.days, duration.seconds (130, 3813) >>> duration.total_seconds() 11235813.0 timedelta.microseconds Between 0 and 999,999 inclusive. Supported operations: +----------------------------------+-------------------------------------------------+ | Operation | Result | |==================================|=================================================| | "t1 = t2 + t3" | Sum of "t2" and "t3". Afterwards "t1 - t2 == | | | t3" and "t1 - t3 == t2" are true. (1) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 - t3" | Difference of "t2" and "t3". Afterwards "t1 == | | | t2 - t3" and "t2 == t1 + t3" are true. (1)(6) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 * i or t1 = i * t2" | Delta multiplied by an integer. Afterwards "t1 | | | // i == t2" is true, provided "i != 0". | +----------------------------------+-------------------------------------------------+ | | In general, "t1 * i == t1 * (i-1) + t1" is | | | true. (1) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 * f or t1 = f * t2" | Delta multiplied by a float. The result is | | | rounded to the nearest multiple of | | | timedelta.resolution using round-half-to-even. | +----------------------------------+-------------------------------------------------+ | "f = t2 / t3" | Division (3) of overall duration "t2" by | | | interval unit "t3". Returns a "float" object. | +----------------------------------+-------------------------------------------------+ | "t1 = t2 / f or t1 = t2 / i" | Delta divided by a float or an int. The result | | | is rounded to the nearest multiple of | | | timedelta.resolution using round-half-to-even. | +----------------------------------+-------------------------------------------------+ | "t1 = t2 // i" or "t1 = t2 // | The floor is computed and the remainder (if | | t3" | any) is thrown away. In the second case, an | | | integer is returned. (3) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 % t3" | The remainder is computed as a "timedelta" | | | object. (3) | +----------------------------------+-------------------------------------------------+ | "q, r = divmod(t1, t2)" | Computes the quotient and the remainder: "q = | | | t1 // t2" (3) and "r = t1 % t2". "q" is an | | | integer and "r" is a "timedelta" object. | +----------------------------------+-------------------------------------------------+ | "+t1" | Returns a "timedelta" object with the same | | | value. (2) | +----------------------------------+-------------------------------------------------+ | "-t1" | Equivalent to "timedelta(-t1.days, -t1.seconds, | | | -t1.microseconds)", and to "t1 * -1". (1)(4) | +----------------------------------+-------------------------------------------------+ | "abs(t)" | Equivalent to "+t" when "t.days >= 0", and to | | | "-t" when "t.days < 0". (2) | +----------------------------------+-------------------------------------------------+ | "str(t)" | Returns a string in the form "[D day[s], | | | ][H]H:MM:SS[.UUUUUU]", where D is negative for | | | negative "t". (5) | +----------------------------------+-------------------------------------------------+ | "repr(t)" | Returns a string representation of the | | | "timedelta" object as a constructor call with | | | canonical attribute values. | +----------------------------------+-------------------------------------------------+ Notes: 1. This is exact but may overflow. 2. This is exact and cannot overflow. 3. Division by zero raises "ZeroDivisionError". 4. "-timedelta.max" is not representable as a "timedelta" object. 5. String representations of "timedelta" objects are normalized similarly to their internal representation. This leads to somewhat unusual results for negative timedeltas. For example: >>> timedelta(hours=-5) datetime.timedelta(days=-1, seconds=68400) >>> print(_) -1 day, 19:00:00 6. The expression "t2 - t3" will always be equal to the expression "t2 + (-t3)" except when t3 is equal to "timedelta.max"; in that case the former will produce a result while the latter will overflow. In addition to the operations listed above, "timedelta" objects support certain additions and subtractions with "date" and "datetime" objects (see below). Changed in version 3.2: Floor division and true division of a "timedelta" object by another "timedelta" object are now supported, as are remainder operations and the "divmod()" function. True division and multiplication of a "timedelta" object by a "float" object are now supported. "timedelta" objects support equality and order comparisons. In Boolean contexts, a "timedelta" object is considered to be true if and only if it isn’t equal to "timedelta(0)". Instance methods: timedelta.total_seconds() Return the total number of seconds contained in the duration. Equivalent to "td / timedelta(seconds=1)". For interval units other than seconds, use the division form directly (e.g. "td / timedelta(microseconds=1)"). Note that for very large time intervals (greater than 270 years on most platforms) this method will lose microsecond accuracy. Added in version 3.2. Examples of usage: "timedelta" ------------------------------ An additional example of normalization: >>> # Components of another_year add up to exactly 365 days >>> from datetime import timedelta >>> year = timedelta(days=365) >>> another_year = timedelta(weeks=40, days=84, hours=23, ... minutes=50, seconds=600) >>> year == another_year True >>> year.total_seconds() 31536000.0 Examples of "timedelta" arithmetic: >>> from datetime import timedelta >>> year = timedelta(days=365) >>> ten_years = 10 * year >>> ten_years datetime.timedelta(days=3650) >>> ten_years.days // 365 10 >>> nine_years = ten_years - year >>> nine_years datetime.timedelta(days=3285) >>> three_years = nine_years // 3 >>> three_years, three_years.days // 365 (datetime.timedelta(days=1095), 3) "date" Objects ============== A "date" object represents a date (year, month and day) in an idealized calendar, the current Gregorian calendar indefinitely extended in both directions. January 1 of year 1 is called day number 1, January 2 of year 1 is called day number 2, and so on. [2] class datetime.date(year, month, day) All arguments are required. Arguments must be integers, in the following ranges: * "MINYEAR <= year <= MAXYEAR" * "1 <= month <= 12" * "1 <= day <= number of days in the given month and year" If an argument outside those ranges is given, "ValueError" is raised. Other constructors, all class methods: classmethod date.today() Return the current local date. This is equivalent to "date.fromtimestamp(time.time())". classmethod date.fromtimestamp(timestamp) Return the local date corresponding to the POSIX timestamp, such as is returned by "time.time()". This may raise "OverflowError", if the timestamp is out of the range of values supported by the platform C "localtime()" function, and "OSError" on "localtime()" failure. It’s common for this to be restricted to years from 1970 through 2038. Note that on non-POSIX systems that include leap seconds in their notion of a timestamp, leap seconds are ignored by "fromtimestamp()". Changed in version 3.3: Raise "OverflowError" instead of "ValueError" if the timestamp is out of the range of values supported by the platform C "localtime()" function. Raise "OSError" instead of "ValueError" on "localtime()" failure. classmethod date.fromordinal(ordinal) Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. "ValueError" is raised unless "1 <= ordinal <= date.max.toordinal()". For any date "d", "date.fromordinal(d.toordinal()) == d". classmethod date.fromisoformat(date_string) Return a "date" corresponding to a *date_string* given in any valid ISO 8601 format, with the following exceptions: 1. Reduced precision dates are not currently supported ("YYYY-MM", "YYYY"). 2. Extended date representations are not currently supported ("±YYYYYY-MM-DD"). 3. Ordinal dates are not currently supported ("YYYY-OOO"). Examples: >>> from datetime import date >>> date.fromisoformat('2019-12-04') datetime.date(2019, 12, 4) >>> date.fromisoformat('20191204') datetime.date(2019, 12, 4) >>> date.fromisoformat('2021-W01-1') datetime.date(2021, 1, 4) Added in version 3.7. Changed in version 3.11: Previously, this method only supported the format "YYYY-MM-DD". classmethod date.fromisocalendar(year, week, day) Return a "date" corresponding to the ISO calendar date specified by year, week and day. This is the inverse of the function "date.isocalendar()". Added in version 3.8. Class attributes: date.min The earliest representable date, "date(MINYEAR, 1, 1)". date.max The latest representable date, "date(MAXYEAR, 12, 31)". date.resolution The smallest possible difference between non-equal date objects, "timedelta(days=1)". Instance attributes (read-only): date.year Between "MINYEAR" and "MAXYEAR" inclusive. date.month Between 1 and 12 inclusive. date.day Between 1 and the number of days in the given month of the given year. Supported operations: +---------------------------------+------------------------------------------------+ | Operation | Result | |=================================|================================================| | "date2 = date1 + timedelta" | "date2" will be "timedelta.days" days after | | | "date1". (1) | +---------------------------------+------------------------------------------------+ | "date2 = date1 - timedelta" | Computes "date2" such that "date2 + timedelta | | | == date1". (2) | +---------------------------------+------------------------------------------------+ | "timedelta = date1 - date2" | (3) | +---------------------------------+------------------------------------------------+ | "date1 == date2" "date1 != | Equality comparison. (4) | | date2" | | +---------------------------------+------------------------------------------------+ | "date1 < date2" "date1 > date2" | Order comparison. (5) | | "date1 <= date2" "date1 >= | | | date2" | | +---------------------------------+------------------------------------------------+ Notes: 1. *date2* is moved forward in time if "timedelta.days > 0", or backward if "timedelta.days < 0". Afterward "date2 - date1 == timedelta.days". "timedelta.seconds" and "timedelta.microseconds" are ignored. "OverflowError" is raised if "date2.year" would be smaller than "MINYEAR" or larger than "MAXYEAR". 2. "timedelta.seconds" and "timedelta.microseconds" are ignored. 3. This is exact, and cannot overflow. "timedelta.seconds" and "timedelta.microseconds" are 0, and "date2 + timedelta == date1" after. 4. "date" objects are equal if they represent the same date. "date" objects that are not also "datetime" instances are never equal to "datetime" objects, even if they represent the same date. 5. *date1* is considered less than *date2* when *date1* precedes *date2* in time. In other words, "date1 < date2" if and only if "date1.toordinal() < date2.toordinal()". Order comparison between a "date" object that is not also a "datetime" instance and a "datetime" object raises "TypeError". Changed in version 3.13: Comparison between "datetime" object and an instance of the "date" subclass that is not a "datetime" subclass no longer converts the latter to "date", ignoring the time part and the time zone. The default behavior can be changed by overriding the special comparison methods in subclasses. In Boolean contexts, all "date" objects are considered to be true. Instance methods: date.replace(year=self.year, month=self.month, day=self.day) Return a new "date" object with the same values, but with specified parameters updated. Example: >>> from datetime import date >>> d = date(2002, 12, 31) >>> d.replace(day=26) datetime.date(2002, 12, 26) The generic function "copy.replace()" also supports "date" objects. date.timetuple() Return a "time.struct_time" such as returned by "time.localtime()". The hours, minutes and seconds are 0, and the DST flag is -1. "d.timetuple()" is equivalent to: time.struct_time((d.year, d.month, d.day, 0, 0, 0, d.weekday(), yday, -1)) where "yday = d.toordinal() - date(d.year, 1, 1).toordinal() + 1" is the day number within the current year starting with 1 for January 1st. date.toordinal() Return the proleptic Gregorian ordinal of the date, where January 1 of year 1 has ordinal 1. For any "date" object "d", "date.fromordinal(d.toordinal()) == d". date.weekday() Return the day of the week as an integer, where Monday is 0 and Sunday is 6. For example, "date(2002, 12, 4).weekday() == 2", a Wednesday. See also "isoweekday()". date.isoweekday() Return the day of the week as an integer, where Monday is 1 and Sunday is 7. For example, "date(2002, 12, 4).isoweekday() == 3", a Wednesday. See also "weekday()", "isocalendar()". date.isocalendar() Return a *named tuple* object with three components: "year", "week" and "weekday". The ISO calendar is a widely used variant of the Gregorian calendar. [3] The ISO year consists of 52 or 53 full weeks, and where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year. For example, 2004 begins on a Thursday, so the first week of ISO year 2004 begins on Monday, 29 Dec 2003 and ends on Sunday, 4 Jan 2004: >>> from datetime import date >>> date(2003, 12, 29).isocalendar() datetime.IsoCalendarDate(year=2004, week=1, weekday=1) >>> date(2004, 1, 4).isocalendar() datetime.IsoCalendarDate(year=2004, week=1, weekday=7) Changed in version 3.9: Result changed from a tuple to a *named tuple*. date.isoformat() Return a string representing the date in ISO 8601 format, "YYYY-MM- DD": >>> from datetime import date >>> date(2002, 12, 4).isoformat() '2002-12-04' date.__str__() For a date "d", "str(d)" is equivalent to "d.isoformat()". date.ctime() Return a string representing the date: >>> from datetime import date >>> date(2002, 12, 4).ctime() 'Wed Dec 4 00:00:00 2002' "d.ctime()" is equivalent to: time.ctime(time.mktime(d.timetuple())) on platforms where the native C "ctime()" function (which "time.ctime()" invokes, but which "date.ctime()" does not invoke) conforms to the C standard. date.strftime(format) Return a string representing the date, controlled by an explicit format string. Format codes referring to hours, minutes or seconds will see 0 values. See also strftime() and strptime() Behavior and "date.isoformat()". date.__format__(format) Same as "date.strftime()". This makes it possible to specify a format string for a "date" object in formatted string literals and when using "str.format()". See also strftime() and strptime() Behavior and "date.isoformat()". Examples of Usage: "date" ------------------------- Example of counting days to an event: >>> import time >>> from datetime import date >>> today = date.today() >>> today datetime.date(2007, 12, 5) >>> today == date.fromtimestamp(time.time()) True >>> my_birthday = date(today.year, 6, 24) >>> if my_birthday < today: ... my_birthday = my_birthday.replace(year=today.year + 1) ... >>> my_birthday datetime.date(2008, 6, 24) >>> time_to_birthday = abs(my_birthday - today) >>> time_to_birthday.days 202 More examples of working with "date": >>> from datetime import date >>> d = date.fromordinal(730920) # 730920th day after 1. 1. 0001 >>> d datetime.date(2002, 3, 11) >>> # Methods related to formatting string output >>> d.isoformat() '2002-03-11' >>> d.strftime("%d/%m/%y") '11/03/02' >>> d.strftime("%A %d. %B %Y") 'Monday 11. March 2002' >>> d.ctime() 'Mon Mar 11 00:00:00 2002' >>> 'The {1} is {0:%d}, the {2} is {0:%B}.'.format(d, "day", "month") 'The day is 11, the month is March.' >>> # Methods for to extracting 'components' under different calendars >>> t = d.timetuple() >>> for i in t: ... print(i) 2002 # year 3 # month 11 # day 0 0 0 0 # weekday (0 = Monday) 70 # 70th day in the year -1 >>> ic = d.isocalendar() >>> for i in ic: ... print(i) 2002 # ISO year 11 # ISO week number 1 # ISO day number ( 1 = Monday ) >>> # A date object is immutable; all operations produce a new object >>> d.replace(year=2005) datetime.date(2005, 3, 11) "datetime" Objects ================== A "datetime" object is a single object containing all the information from a "date" object and a "time" object. Like a "date" object, "datetime" assumes the current Gregorian calendar extended in both directions; like a "time" object, "datetime" assumes there are exactly 3600*24 seconds in every day. Constructor: class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0) The *year*, *month* and *day* arguments are required. *tzinfo* may be "None", or an instance of a "tzinfo" subclass. The remaining arguments must be integers in the following ranges: * "MINYEAR <= year <= MAXYEAR", * "1 <= month <= 12", * "1 <= day <= number of days in the given month and year", * "0 <= hour < 24", * "0 <= minute < 60", * "0 <= second < 60", * "0 <= microsecond < 1000000", * "fold in [0, 1]". If an argument outside those ranges is given, "ValueError" is raised. Changed in version 3.6: Added the *fold* parameter. Other constructors, all class methods: classmethod datetime.today() Return the current local date and time, with "tzinfo" "None". Equivalent to: datetime.fromtimestamp(time.time()) See also "now()", "fromtimestamp()". This method is functionally equivalent to "now()", but without a "tz" parameter. classmethod datetime.now(tz=None) Return the current local date and time. If optional argument *tz* is "None" or not specified, this is like "today()", but, if possible, supplies more precision than can be gotten from going through a "time.time()" timestamp (for example, this may be possible on platforms supplying the C "gettimeofday()" function). If *tz* is not "None", it must be an instance of a "tzinfo" subclass, and the current date and time are converted to *tz*’s time zone. This function is preferred over "today()" and "utcnow()". Note: Subsequent calls to "datetime.now()" may return the same instant depending on the precision of the underlying clock. classmethod datetime.utcnow() Return the current UTC date and time, with "tzinfo" "None". This is like "now()", but returns the current UTC date and time, as a naive "datetime" object. An aware current UTC datetime can be obtained by calling "datetime.now(timezone.utc)". See also "now()". Warning: Because naive "datetime" objects are treated by many "datetime" methods as local times, it is preferred to use aware datetimes to represent times in UTC. As such, the recommended way to create an object representing the current time in UTC is by calling "datetime.now(timezone.utc)". Deprecated since version 3.12: Use "datetime.now()" with "UTC" instead. classmethod datetime.fromtimestamp(timestamp, tz=None) Return the local date and time corresponding to the POSIX timestamp, such as is returned by "time.time()". If optional argument *tz* is "None" or not specified, the timestamp is converted to the platform’s local date and time, and the returned "datetime" object is naive. If *tz* is not "None", it must be an instance of a "tzinfo" subclass, and the timestamp is converted to *tz*’s time zone. "fromtimestamp()" may raise "OverflowError", if the timestamp is out of the range of values supported by the platform C "localtime()" or "gmtime()" functions, and "OSError" on "localtime()" or "gmtime()" failure. It’s common for this to be restricted to years in 1970 through 2038. Note that on non-POSIX systems that include leap seconds in their notion of a timestamp, leap seconds are ignored by "fromtimestamp()", and then it’s possible to have two timestamps differing by a second that yield identical "datetime" objects. This method is preferred over "utcfromtimestamp()". Changed in version 3.3: Raise "OverflowError" instead of "ValueError" if the timestamp is out of the range of values supported by the platform C "localtime()" or "gmtime()" functions. Raise "OSError" instead of "ValueError" on "localtime()" or "gmtime()" failure. Changed in version 3.6: "fromtimestamp()" may return instances with "fold" set to 1. classmethod datetime.utcfromtimestamp(timestamp) Return the UTC "datetime" corresponding to the POSIX timestamp, with "tzinfo" "None". (The resulting object is naive.) This may raise "OverflowError", if the timestamp is out of the range of values supported by the platform C "gmtime()" function, and "OSError" on "gmtime()" failure. It’s common for this to be restricted to years in 1970 through 2038. To get an aware "datetime" object, call "fromtimestamp()": datetime.fromtimestamp(timestamp, timezone.utc) On the POSIX compliant platforms, it is equivalent to the following expression: datetime(1970, 1, 1, tzinfo=timezone.utc) + timedelta(seconds=timestamp) except the latter formula always supports the full years range: between "MINYEAR" and "MAXYEAR" inclusive. Warning: Because naive "datetime" objects are treated by many "datetime" methods as local times, it is preferred to use aware datetimes to represent times in UTC. As such, the recommended way to create an object representing a specific timestamp in UTC is by calling "datetime.fromtimestamp(timestamp, tz=timezone.utc)". Changed in version 3.3: Raise "OverflowError" instead of "ValueError" if the timestamp is out of the range of values supported by the platform C "gmtime()" function. Raise "OSError" instead of "ValueError" on "gmtime()" failure. Deprecated since version 3.12: Use "datetime.fromtimestamp()" with "UTC" instead. classmethod datetime.fromordinal(ordinal) Return the "datetime" corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. "ValueError" is raised unless "1 <= ordinal <= datetime.max.toordinal()". The hour, minute, second and microsecond of the result are all 0, and "tzinfo" is "None". classmethod datetime.combine(date, time, tzinfo=time.tzinfo) Return a new "datetime" object whose date components are equal to the given "date" object’s, and whose time components are equal to the given "time" object’s. If the *tzinfo* argument is provided, its value is used to set the "tzinfo" attribute of the result, otherwise the "tzinfo" attribute of the *time* argument is used. If the *date* argument is a "datetime" object, its time components and "tzinfo" attributes are ignored. For any "datetime" object "d", "d == datetime.combine(d.date(), d.time(), d.tzinfo)". Changed in version 3.6: Added the *tzinfo* argument. classmethod datetime.fromisoformat(date_string) Return a "datetime" corresponding to a *date_string* in any valid ISO 8601 format, with the following exceptions: 1. Time zone offsets may have fractional seconds. 2. The "T" separator may be replaced by any single unicode character. 3. Fractional hours and minutes are not supported. 4. Reduced precision dates are not currently supported ("YYYY-MM", "YYYY"). 5. Extended date representations are not currently supported ("±YYYYYY-MM-DD"). 6. Ordinal dates are not currently supported ("YYYY-OOO"). Examples: >>> from datetime import datetime >>> datetime.fromisoformat('2011-11-04') datetime.datetime(2011, 11, 4, 0, 0) >>> datetime.fromisoformat('20111104') datetime.datetime(2011, 11, 4, 0, 0) >>> datetime.fromisoformat('2011-11-04T00:05:23') datetime.datetime(2011, 11, 4, 0, 5, 23) >>> datetime.fromisoformat('2011-11-04T00:05:23Z') datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone.utc) >>> datetime.fromisoformat('20111104T000523') datetime.datetime(2011, 11, 4, 0, 5, 23) >>> datetime.fromisoformat('2011-W01-2T00:05:23.283') datetime.datetime(2011, 1, 4, 0, 5, 23, 283000) >>> datetime.fromisoformat('2011-11-04 00:05:23.283') datetime.datetime(2011, 11, 4, 0, 5, 23, 283000) >>> datetime.fromisoformat('2011-11-04 00:05:23.283+00:00') datetime.datetime(2011, 11, 4, 0, 5, 23, 283000, tzinfo=datetime.timezone.utc) >>> datetime.fromisoformat('2011-11-04T00:05:23+04:00') datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone(datetime.timedelta(seconds=14400))) Added in version 3.7. Changed in version 3.11: Previously, this method only supported formats that could be emitted by "date.isoformat()" or "datetime.isoformat()". classmethod datetime.fromisocalendar(year, week, day) Return a "datetime" corresponding to the ISO calendar date specified by year, week and day. The non-date components of the datetime are populated with their normal default values. This is the inverse of the function "datetime.isocalendar()". Added in version 3.8. classmethod datetime.strptime(date_string, format) Return a "datetime" corresponding to *date_string*, parsed according to *format*. If *format* does not contain microseconds or time zone information, this is equivalent to: datetime(*(time.strptime(date_string, format)[0:6])) "ValueError" is raised if the date_string and format can’t be parsed by "time.strptime()" or if it returns a value which isn’t a time tuple. See also strftime() and strptime() Behavior and "datetime.fromisoformat()". Changed in version 3.13: If *format* specifies a day of month without a year a "DeprecationWarning" is now emitted. This is to avoid a quadrennial leap year bug in code seeking to parse only a month and day as the default year used in absence of one in the format is not a leap year. Such *format* values may raise an error as of Python 3.15. The workaround is to always include a year in your *format*. If parsing *date_string* values that do not have a year, explicitly add a year that is a leap year before parsing: >>> from datetime import datetime >>> date_string = "02/29" >>> when = datetime.strptime(f"{date_string};1984", "%m/%d;%Y") # Avoids leap year bug. >>> when.strftime("%B %d") 'February 29' Class attributes: datetime.min The earliest representable "datetime", "datetime(MINYEAR, 1, 1, tzinfo=None)". datetime.max The latest representable "datetime", "datetime(MAXYEAR, 12, 31, 23, 59, 59, 999999, tzinfo=None)". datetime.resolution The smallest possible difference between non-equal "datetime" objects, "timedelta(microseconds=1)". Instance attributes (read-only): datetime.year Between "MINYEAR" and "MAXYEAR" inclusive. datetime.month Between 1 and 12 inclusive. datetime.day Between 1 and the number of days in the given month of the given year. datetime.hour In "range(24)". datetime.minute In "range(60)". datetime.second In "range(60)". datetime.microsecond In "range(1000000)". datetime.tzinfo The object passed as the *tzinfo* argument to the "datetime" constructor, or "None" if none was passed. datetime.fold In "[0, 1]". Used to disambiguate wall times during a repeated interval. (A repeated interval occurs when clocks are rolled back at the end of daylight saving time or when the UTC offset for the current zone is decreased for political reasons.) The values 0 and 1 represent, respectively, the earlier and later of the two moments with the same wall time representation. Added in version 3.6. Supported operations: +-----------------------------------------+----------------------------------+ | Operation | Result | |=========================================|==================================| | "datetime2 = datetime1 + timedelta" | (1) | +-----------------------------------------+----------------------------------+ | "datetime2 = datetime1 - timedelta" | (2) | +-----------------------------------------+----------------------------------+ | "timedelta = datetime1 - datetime2" | (3) | +-----------------------------------------+----------------------------------+ | "datetime1 == datetime2" "datetime1 != | Equality comparison. (4) | | datetime2" | | +-----------------------------------------+----------------------------------+ | "datetime1 < datetime2" "datetime1 > | Order comparison. (5) | | datetime2" "datetime1 <= datetime2" | | | "datetime1 >= datetime2" | | +-----------------------------------------+----------------------------------+ 1. "datetime2" is a duration of "timedelta" removed from "datetime1", moving forward in time if "timedelta.days > 0", or backward if "timedelta.days < 0". The result has the same "tzinfo" attribute as the input datetime, and "datetime2 - datetime1 == timedelta" after. "OverflowError" is raised if "datetime2.year" would be smaller than "MINYEAR" or larger than "MAXYEAR". Note that no time zone adjustments are done even if the input is an aware object. 2. Computes the "datetime2" such that "datetime2 + timedelta == datetime1". As for addition, the result has the same "tzinfo" attribute as the input datetime, and no time zone adjustments are done even if the input is aware. 3. Subtraction of a "datetime" from a "datetime" is defined only if both operands are naive, or if both are aware. If one is aware and the other is naive, "TypeError" is raised. If both are naive, or both are aware and have the same "tzinfo" attribute, the "tzinfo" attributes are ignored, and the result is a "timedelta" object "t" such that "datetime2 + t == datetime1". No time zone adjustments are done in this case. If both are aware and have different "tzinfo" attributes, "a-b" acts as if "a" and "b" were first converted to naive UTC datetimes. The result is "(a.replace(tzinfo=None) - a.utcoffset()) - (b.replace(tzinfo=None) - b.utcoffset())" except that the implementation never overflows. 4. "datetime" objects are equal if they represent the same date and time, taking into account the time zone. Naive and aware "datetime" objects are never equal. If both comparands are aware, and have the same "tzinfo" attribute, the "tzinfo" and "fold" attributes are ignored and the base datetimes are compared. If both comparands are aware and have different "tzinfo" attributes, the comparison acts as comparands were first converted to UTC datetimes except that the implementation never overflows. "datetime" instances in a repeated interval are never equal to "datetime" instances in other time zone. 5. *datetime1* is considered less than *datetime2* when *datetime1* precedes *datetime2* in time, taking into account the time zone. Order comparison between naive and aware "datetime" objects raises "TypeError". If both comparands are aware, and have the same "tzinfo" attribute, the "tzinfo" and "fold" attributes are ignored and the base datetimes are compared. If both comparands are aware and have different "tzinfo" attributes, the comparison acts as comparands were first converted to UTC datetimes except that the implementation never overflows. Changed in version 3.3: Equality comparisons between aware and naive "datetime" instances don’t raise "TypeError". Changed in version 3.13: Comparison between "datetime" object and an instance of the "date" subclass that is not a "datetime" subclass no longer converts the latter to "date", ignoring the time part and the time zone. The default behavior can be changed by overriding the special comparison methods in subclasses. Instance methods: datetime.date() Return "date" object with same year, month and day. datetime.time() Return "time" object with same hour, minute, second, microsecond and fold. "tzinfo" is "None". See also method "timetz()". Changed in version 3.6: The fold value is copied to the returned "time" object. datetime.timetz() Return "time" object with same hour, minute, second, microsecond, fold, and tzinfo attributes. See also method "time()". Changed in version 3.6: The fold value is copied to the returned "time" object. datetime.replace(year=self.year, month=self.month, day=self.day, hour=self.hour, minute=self.minute, second=self.second, microsecond=self.microsecond, tzinfo=self.tzinfo, *, fold=0) Return a new "datetime" object with the same attributes, but with specified parameters updated. Note that "tzinfo=None" can be specified to create a naive datetime from an aware datetime with no conversion of date and time data. "datetime" objects are also supported by generic function "copy.replace()". Changed in version 3.6: Added the *fold* parameter. datetime.astimezone(tz=None) Return a "datetime" object with new "tzinfo" attribute *tz*, adjusting the date and time data so the result is the same UTC time as *self*, but in *tz*’s local time. If provided, *tz* must be an instance of a "tzinfo" subclass, and its "utcoffset()" and "dst()" methods must not return "None". If *self* is naive, it is presumed to represent time in the system time zone. If called without arguments (or with "tz=None") the system local time zone is assumed for the target time zone. The ".tzinfo" attribute of the converted datetime instance will be set to an instance of "timezone" with the zone name and offset obtained from the OS. If "self.tzinfo" is *tz*, "self.astimezone(tz)" is equal to *self*: no adjustment of date or time data is performed. Else the result is local time in the time zone *tz*, representing the same UTC time as *self*: after "astz = dt.astimezone(tz)", "astz - astz.utcoffset()" will have the same date and time data as "dt - dt.utcoffset()". If you merely want to attach a "timezone" object *tz* to a datetime *dt* without adjustment of date and time data, use "dt.replace(tzinfo=tz)". If you merely want to remove the "timezone" object from an aware datetime *dt* without conversion of date and time data, use "dt.replace(tzinfo=None)". Note that the default "tzinfo.fromutc()" method can be overridden in a "tzinfo" subclass to affect the result returned by "astimezone()". Ignoring error cases, "astimezone()" acts like: def astimezone(self, tz): if self.tzinfo is tz: return self # Convert self to UTC, and attach the new timezone object. utc = (self - self.utcoffset()).replace(tzinfo=tz) # Convert from UTC to tz's local time. return tz.fromutc(utc) Changed in version 3.3: *tz* now can be omitted. Changed in version 3.6: The "astimezone()" method can now be called on naive instances that are presumed to represent system local time. datetime.utcoffset() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.utcoffset(self)", and raises an exception if the latter doesn’t return "None" or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. datetime.dst() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.dst(self)", and raises an exception if the latter doesn’t return "None" or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The DST offset is not restricted to a whole number of minutes. datetime.tzname() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.tzname(self)", raises an exception if the latter doesn’t return "None" or a string object, datetime.timetuple() Return a "time.struct_time" such as returned by "time.localtime()". "d.timetuple()" is equivalent to: time.struct_time((d.year, d.month, d.day, d.hour, d.minute, d.second, d.weekday(), yday, dst)) where "yday = d.toordinal() - date(d.year, 1, 1).toordinal() + 1" is the day number within the current year starting with 1 for January 1st. The "tm_isdst" flag of the result is set according to the "dst()" method: "tzinfo" is "None" or "dst()" returns "None", "tm_isdst" is set to "-1"; else if "dst()" returns a non-zero value, "tm_isdst" is set to 1; else "tm_isdst" is set to 0. datetime.utctimetuple() If "datetime" instance "d" is naive, this is the same as "d.timetuple()" except that "tm_isdst" is forced to 0 regardless of what "d.dst()" returns. DST is never in effect for a UTC time. If "d" is aware, "d" is normalized to UTC time, by subtracting "d.utcoffset()", and a "time.struct_time" for the normalized time is returned. "tm_isdst" is forced to 0. Note that an "OverflowError" may be raised if "d.year" was "MINYEAR" or "MAXYEAR" and UTC adjustment spills over a year boundary. Warning: Because naive "datetime" objects are treated by many "datetime" methods as local times, it is preferred to use aware datetimes to represent times in UTC; as a result, using "datetime.utctimetuple()" may give misleading results. If you have a naive "datetime" representing UTC, use "datetime.replace(tzinfo=timezone.utc)" to make it aware, at which point you can use "datetime.timetuple()". datetime.toordinal() Return the proleptic Gregorian ordinal of the date. The same as "self.date().toordinal()". datetime.timestamp() Return POSIX timestamp corresponding to the "datetime" instance. The return value is a "float" similar to that returned by "time.time()". Naive "datetime" instances are assumed to represent local time and this method relies on the platform C "mktime()" function to perform the conversion. Since "datetime" supports wider range of values than "mktime()" on many platforms, this method may raise "OverflowError" or "OSError" for times far in the past or far in the future. For aware "datetime" instances, the return value is computed as: (dt - datetime(1970, 1, 1, tzinfo=timezone.utc)).total_seconds() Added in version 3.3. Changed in version 3.6: The "timestamp()" method uses the "fold" attribute to disambiguate the times during a repeated interval. Note: There is no method to obtain the POSIX timestamp directly from a naive "datetime" instance representing UTC time. If your application uses this convention and your system time zone is not set to UTC, you can obtain the POSIX timestamp by supplying "tzinfo=timezone.utc": timestamp = dt.replace(tzinfo=timezone.utc).timestamp() or by calculating the timestamp directly: timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1) datetime.weekday() Return the day of the week as an integer, where Monday is 0 and Sunday is 6. The same as "self.date().weekday()". See also "isoweekday()". datetime.isoweekday() Return the day of the week as an integer, where Monday is 1 and Sunday is 7. The same as "self.date().isoweekday()". See also "weekday()", "isocalendar()". datetime.isocalendar() Return a *named tuple* with three components: "year", "week" and "weekday". The same as "self.date().isocalendar()". datetime.isoformat(sep='T', timespec='auto') Return a string representing the date and time in ISO 8601 format: * "YYYY-MM-DDTHH:MM:SS.ffffff", if "microsecond" is not 0 * "YYYY-MM-DDTHH:MM:SS", if "microsecond" is 0 If "utcoffset()" does not return "None", a string is appended, giving the UTC offset: * "YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM[:SS[.ffffff]]", if "microsecond" is not 0 * "YYYY-MM-DDTHH:MM:SS+HH:MM[:SS[.ffffff]]", if "microsecond" is 0 Examples: >>> from datetime import datetime, timezone >>> datetime(2019, 5, 18, 15, 17, 8, 132263).isoformat() '2019-05-18T15:17:08.132263' >>> datetime(2019, 5, 18, 15, 17, tzinfo=timezone.utc).isoformat() '2019-05-18T15:17:00+00:00' The optional argument *sep* (default "'T'") is a one-character separator, placed between the date and time portions of the result. For example: >>> from datetime import tzinfo, timedelta, datetime >>> class TZ(tzinfo): ... """A time zone with an arbitrary, constant -06:39 offset.""" ... def utcoffset(self, dt): ... return timedelta(hours=-6, minutes=-39) ... >>> datetime(2002, 12, 25, tzinfo=TZ()).isoformat(' ') '2002-12-25 00:00:00-06:39' >>> datetime(2009, 11, 27, microsecond=100, tzinfo=TZ()).isoformat() '2009-11-27T00:00:00.000100-06:39' The optional argument *timespec* specifies the number of additional components of the time to include (the default is "'auto'"). It can be one of the following: * "'auto'": Same as "'seconds'" if "microsecond" is 0, same as "'microseconds'" otherwise. * "'hours'": Include the "hour" in the two-digit "HH" format. * "'minutes'": Include "hour" and "minute" in "HH:MM" format. * "'seconds'": Include "hour", "minute", and "second" in "HH:MM:SS" format. * "'milliseconds'": Include full time, but truncate fractional second part to milliseconds. "HH:MM:SS.sss" format. * "'microseconds'": Include full time in "HH:MM:SS.ffffff" format. Note: Excluded time components are truncated, not rounded. "ValueError" will be raised on an invalid *timespec* argument: >>> from datetime import datetime >>> datetime.now().isoformat(timespec='minutes') '2002-12-25T00:00' >>> dt = datetime(2015, 1, 1, 12, 30, 59, 0) >>> dt.isoformat(timespec='microseconds') '2015-01-01T12:30:59.000000' Changed in version 3.6: Added the *timespec* parameter. datetime.__str__() For a "datetime" instance "d", "str(d)" is equivalent to "d.isoformat(' ')". datetime.ctime() Return a string representing the date and time: >>> from datetime import datetime >>> datetime(2002, 12, 4, 20, 30, 40).ctime() 'Wed Dec 4 20:30:40 2002' The output string will *not* include time zone information, regardless of whether the input is aware or naive. "d.ctime()" is equivalent to: time.ctime(time.mktime(d.timetuple())) on platforms where the native C "ctime()" function (which "time.ctime()" invokes, but which "datetime.ctime()" does not invoke) conforms to the C standard. datetime.strftime(format) Return a string representing the date and time, controlled by an explicit format string. See also strftime() and strptime() Behavior and "datetime.isoformat()". datetime.__format__(format) Same as "datetime.strftime()". This makes it possible to specify a format string for a "datetime" object in formatted string literals and when using "str.format()". See also strftime() and strptime() Behavior and "datetime.isoformat()". Examples of Usage: "datetime" ----------------------------- Examples of working with "datetime" objects: >>> from datetime import datetime, date, time, timezone >>> # Using datetime.combine() >>> d = date(2005, 7, 14) >>> t = time(12, 30) >>> datetime.combine(d, t) datetime.datetime(2005, 7, 14, 12, 30) >>> # Using datetime.now() >>> datetime.now() datetime.datetime(2007, 12, 6, 16, 29, 43, 79043) # GMT +1 >>> datetime.now(timezone.utc) datetime.datetime(2007, 12, 6, 15, 29, 43, 79060, tzinfo=datetime.timezone.utc) >>> # Using datetime.strptime() >>> dt = datetime.strptime("21/11/06 16:30", "%d/%m/%y %H:%M") >>> dt datetime.datetime(2006, 11, 21, 16, 30) >>> # Using datetime.timetuple() to get tuple of all attributes >>> tt = dt.timetuple() >>> for it in tt: ... print(it) ... 2006 # year 11 # month 21 # day 16 # hour 30 # minute 0 # second 1 # weekday (0 = Monday) 325 # number of days since 1st January -1 # dst - method tzinfo.dst() returned None >>> # Date in ISO format >>> ic = dt.isocalendar() >>> for it in ic: ... print(it) ... 2006 # ISO year 47 # ISO week 2 # ISO weekday >>> # Formatting a datetime >>> dt.strftime("%A, %d. %B %Y %I:%M%p") 'Tuesday, 21. November 2006 04:30PM' >>> 'The {1} is {0:%d}, the {2} is {0:%B}, the {3} is {0:%I:%M%p}.'.format(dt, "day", "month", "time") 'The day is 21, the month is November, the time is 04:30PM.' The example below defines a "tzinfo" subclass capturing time zone information for Kabul, Afghanistan, which used +4 UTC until 1945 and then +4:30 UTC thereafter: from datetime import timedelta, datetime, tzinfo, timezone class KabulTz(tzinfo): # Kabul used +4 until 1945, when they moved to +4:30 UTC_MOVE_DATE = datetime(1944, 12, 31, 20, tzinfo=timezone.utc) def utcoffset(self, dt): if dt.year < 1945: return timedelta(hours=4) elif (1945, 1, 1, 0, 0) <= dt.timetuple()[:5] < (1945, 1, 1, 0, 30): # An ambiguous ("imaginary") half-hour range representing # a 'fold' in time due to the shift from +4 to +4:30. # If dt falls in the imaginary range, use fold to decide how # to resolve. See PEP495. return timedelta(hours=4, minutes=(30 if dt.fold else 0)) else: return timedelta(hours=4, minutes=30) def fromutc(self, dt): # Follow same validations as in datetime.tzinfo if not isinstance(dt, datetime): raise TypeError("fromutc() requires a datetime argument") if dt.tzinfo is not self: raise ValueError("dt.tzinfo is not self") # A custom implementation is required for fromutc as # the input to this function is a datetime with utc values # but with a tzinfo set to self. # See datetime.astimezone or fromtimestamp. if dt.replace(tzinfo=timezone.utc) >= self.UTC_MOVE_DATE: return dt + timedelta(hours=4, minutes=30) else: return dt + timedelta(hours=4) def dst(self, dt): # Kabul does not observe daylight saving time. return timedelta(0) def tzname(self, dt): if dt >= self.UTC_MOVE_DATE: return "+04:30" return "+04" Usage of "KabulTz" from above: >>> tz1 = KabulTz() >>> # Datetime before the change >>> dt1 = datetime(1900, 11, 21, 16, 30, tzinfo=tz1) >>> print(dt1.utcoffset()) 4:00:00 >>> # Datetime after the change >>> dt2 = datetime(2006, 6, 14, 13, 0, tzinfo=tz1) >>> print(dt2.utcoffset()) 4:30:00 >>> # Convert datetime to another time zone >>> dt3 = dt2.astimezone(timezone.utc) >>> dt3 datetime.datetime(2006, 6, 14, 8, 30, tzinfo=datetime.timezone.utc) >>> dt2 datetime.datetime(2006, 6, 14, 13, 0, tzinfo=KabulTz()) >>> dt2 == dt3 True "time" Objects ============== A "time" object represents a (local) time of day, independent of any particular day, and subject to adjustment via a "tzinfo" object. class datetime.time(hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0) All arguments are optional. *tzinfo* may be "None", or an instance of a "tzinfo" subclass. The remaining arguments must be integers in the following ranges: * "0 <= hour < 24", * "0 <= minute < 60", * "0 <= second < 60", * "0 <= microsecond < 1000000", * "fold in [0, 1]". If an argument outside those ranges is given, "ValueError" is raised. All default to 0 except *tzinfo*, which defaults to "None". Class attributes: time.min The earliest representable "time", "time(0, 0, 0, 0)". time.max The latest representable "time", "time(23, 59, 59, 999999)". time.resolution The smallest possible difference between non-equal "time" objects, "timedelta(microseconds=1)", although note that arithmetic on "time" objects is not supported. Instance attributes (read-only): time.hour In "range(24)". time.minute In "range(60)". time.second In "range(60)". time.microsecond In "range(1000000)". time.tzinfo The object passed as the tzinfo argument to the "time" constructor, or "None" if none was passed. time.fold In "[0, 1]". Used to disambiguate wall times during a repeated interval. (A repeated interval occurs when clocks are rolled back at the end of daylight saving time or when the UTC offset for the current zone is decreased for political reasons.) The values 0 and 1 represent, respectively, the earlier and later of the two moments with the same wall time representation. Added in version 3.6. "time" objects support equality and order comparisons, where "a" is considered less than "b" when "a" precedes "b" in time. Naive and aware "time" objects are never equal. Order comparison between naive and aware "time" objects raises "TypeError". If both comparands are aware, and have the same "tzinfo" attribute, the "tzinfo" and "fold" attributes are ignored and the base times are compared. If both comparands are aware and have different "tzinfo" attributes, the comparands are first adjusted by subtracting their UTC offsets (obtained from "self.utcoffset()"). Changed in version 3.3: Equality comparisons between aware and naive "time" instances don’t raise "TypeError". In Boolean contexts, a "time" object is always considered to be true. Changed in version 3.5: Before Python 3.5, a "time" object was considered to be false if it represented midnight in UTC. This behavior was considered obscure and error-prone and has been removed in Python 3.5. See bpo-13936 for full details. Other constructor: classmethod time.fromisoformat(time_string) Return a "time" corresponding to a *time_string* in any valid ISO 8601 format, with the following exceptions: 1. Time zone offsets may have fractional seconds. 2. The leading "T", normally required in cases where there may be ambiguity between a date and a time, is not required. 3. Fractional seconds may have any number of digits (anything beyond 6 will be truncated). 4. Fractional hours and minutes are not supported. Examples: >>> from datetime import time >>> time.fromisoformat('04:23:01') datetime.time(4, 23, 1) >>> time.fromisoformat('T04:23:01') datetime.time(4, 23, 1) >>> time.fromisoformat('T042301') datetime.time(4, 23, 1) >>> time.fromisoformat('04:23:01.000384') datetime.time(4, 23, 1, 384) >>> time.fromisoformat('04:23:01,000384') datetime.time(4, 23, 1, 384) >>> time.fromisoformat('04:23:01+04:00') datetime.time(4, 23, 1, tzinfo=datetime.timezone(datetime.timedelta(seconds=14400))) >>> time.fromisoformat('04:23:01Z') datetime.time(4, 23, 1, tzinfo=datetime.timezone.utc) >>> time.fromisoformat('04:23:01+00:00') datetime.time(4, 23, 1, tzinfo=datetime.timezone.utc) Added in version 3.7. Changed in version 3.11: Previously, this method only supported formats that could be emitted by "time.isoformat()". Instance methods: time.replace(hour=self.hour, minute=self.minute, second=self.second, microsecond=self.microsecond, tzinfo=self.tzinfo, *, fold=0) Return a new "time" with the same values, but with specified parameters updated. Note that "tzinfo=None" can be specified to create a naive "time" from an aware "time", without conversion of the time data. "time" objects are also supported by generic function "copy.replace()". Changed in version 3.6: Added the *fold* parameter. time.isoformat(timespec='auto') Return a string representing the time in ISO 8601 format, one of: * "HH:MM:SS.ffffff", if "microsecond" is not 0 * "HH:MM:SS", if "microsecond" is 0 * "HH:MM:SS.ffffff+HH:MM[:SS[.ffffff]]", if "utcoffset()" does not return "None" * "HH:MM:SS+HH:MM[:SS[.ffffff]]", if "microsecond" is 0 and "utcoffset()" does not return "None" The optional argument *timespec* specifies the number of additional components of the time to include (the default is "'auto'"). It can be one of the following: * "'auto'": Same as "'seconds'" if "microsecond" is 0, same as "'microseconds'" otherwise. * "'hours'": Include the "hour" in the two-digit "HH" format. * "'minutes'": Include "hour" and "minute" in "HH:MM" format. * "'seconds'": Include "hour", "minute", and "second" in "HH:MM:SS" format. * "'milliseconds'": Include full time, but truncate fractional second part to milliseconds. "HH:MM:SS.sss" format. * "'microseconds'": Include full time in "HH:MM:SS.ffffff" format. Note: Excluded time components are truncated, not rounded. "ValueError" will be raised on an invalid *timespec* argument. Example: >>> from datetime import time >>> time(hour=12, minute=34, second=56, microsecond=123456).isoformat(timespec='minutes') '12:34' >>> dt = time(hour=12, minute=34, second=56, microsecond=0) >>> dt.isoformat(timespec='microseconds') '12:34:56.000000' >>> dt.isoformat(timespec='auto') '12:34:56' Changed in version 3.6: Added the *timespec* parameter. time.__str__() For a time "t", "str(t)" is equivalent to "t.isoformat()". time.strftime(format) Return a string representing the time, controlled by an explicit format string. See also strftime() and strptime() Behavior and "time.isoformat()". time.__format__(format) Same as "time.strftime()". This makes it possible to specify a format string for a "time" object in formatted string literals and when using "str.format()". See also strftime() and strptime() Behavior and "time.isoformat()". time.utcoffset() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.utcoffset(None)", and raises an exception if the latter doesn’t return "None" or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. time.dst() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.dst(None)", and raises an exception if the latter doesn’t return "None", or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The DST offset is not restricted to a whole number of minutes. time.tzname() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.tzname(None)", or raises an exception if the latter doesn’t return "None" or a string object. Examples of Usage: "time" ------------------------- Examples of working with a "time" object: >>> from datetime import time, tzinfo, timedelta >>> class TZ1(tzinfo): ... def utcoffset(self, dt): ... return timedelta(hours=1) ... def dst(self, dt): ... return timedelta(0) ... def tzname(self,dt): ... return "+01:00" ... def __repr__(self): ... return f"{self.__class__.__name__}()" ... >>> t = time(12, 10, 30, tzinfo=TZ1()) >>> t datetime.time(12, 10, 30, tzinfo=TZ1()) >>> t.isoformat() '12:10:30+01:00' >>> t.dst() datetime.timedelta(0) >>> t.tzname() '+01:00' >>> t.strftime("%H:%M:%S %Z") '12:10:30 +01:00' >>> 'The {} is {:%H:%M}.'.format("time", t) 'The time is 12:10.' "tzinfo" Objects ================ class datetime.tzinfo This is an abstract base class, meaning that this class should not be instantiated directly. Define a subclass of "tzinfo" to capture information about a particular time zone. An instance of (a concrete subclass of) "tzinfo" can be passed to the constructors for "datetime" and "time" objects. The latter objects view their attributes as being in local time, and the "tzinfo" object supports methods revealing offset of local time from UTC, the name of the time zone, and DST offset, all relative to a date or time object passed to them. You need to derive a concrete subclass, and (at least) supply implementations of the standard "tzinfo" methods needed by the "datetime" methods you use. The "datetime" module provides "timezone", a simple concrete subclass of "tzinfo" which can represent time zones with fixed offset from UTC such as UTC itself or North American EST and EDT. Special requirement for pickling: A "tzinfo" subclass must have an "__init__()" method that can be called with no arguments, otherwise it can be pickled but possibly not unpickled again. This is a technical requirement that may be relaxed in the future. A concrete subclass of "tzinfo" may need to implement the following methods. Exactly which methods are needed depends on the uses made of aware "datetime" objects. If in doubt, simply implement all of them. tzinfo.utcoffset(dt) Return offset of local time from UTC, as a "timedelta" object that is positive east of UTC. If local time is west of UTC, this should be negative. This represents the *total* offset from UTC; for example, if a "tzinfo" object represents both time zone and DST adjustments, "utcoffset()" should return their sum. If the UTC offset isn’t known, return "None". Else the value returned must be a "timedelta" object strictly between "-timedelta(hours=24)" and "timedelta(hours=24)" (the magnitude of the offset must be less than one day). Most implementations of "utcoffset()" will probably look like one of these two: return CONSTANT # fixed-offset class return CONSTANT + self.dst(dt) # daylight-aware class If "utcoffset()" does not return "None", "dst()" should not return "None" either. The default implementation of "utcoffset()" raises "NotImplementedError". Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. tzinfo.dst(dt) Return the daylight saving time (DST) adjustment, as a "timedelta" object or "None" if DST information isn’t known. Return "timedelta(0)" if DST is not in effect. If DST is in effect, return the offset as a "timedelta" object (see "utcoffset()" for details). Note that DST offset, if applicable, has already been added to the UTC offset returned by "utcoffset()", so there’s no need to consult "dst()" unless you’re interested in obtaining DST info separately. For example, "datetime.timetuple()" calls its "tzinfo" attribute’s "dst()" method to determine how the "tm_isdst" flag should be set, and "tzinfo.fromutc()" calls "dst()" to account for DST changes when crossing time zones. An instance *tz* of a "tzinfo" subclass that models both standard and daylight times must be consistent in this sense: "tz.utcoffset(dt) - tz.dst(dt)" must return the same result for every "datetime" *dt* with "dt.tzinfo == tz". For sane "tzinfo" subclasses, this expression yields the time zone’s “standard offset”, which should not depend on the date or the time, but only on geographic location. The implementation of "datetime.astimezone()" relies on this, but cannot detect violations; it’s the programmer’s responsibility to ensure it. If a "tzinfo" subclass cannot guarantee this, it may be able to override the default implementation of "tzinfo.fromutc()" to work correctly with "astimezone()" regardless. Most implementations of "dst()" will probably look like one of these two: def dst(self, dt): # a fixed-offset class: doesn't account for DST return timedelta(0) or: def dst(self, dt): # Code to set dston and dstoff to the time zone's DST # transition times based on the input dt.year, and expressed # in standard local time. if dston <= dt.replace(tzinfo=None) < dstoff: return timedelta(hours=1) else: return timedelta(0) The default implementation of "dst()" raises "NotImplementedError". Changed in version 3.7: The DST offset is not restricted to a whole number of minutes. tzinfo.tzname(dt) Return the time zone name corresponding to the "datetime" object *dt*, as a string. Nothing about string names is defined by the "datetime" module, and there’s no requirement that it mean anything in particular. For example, ""GMT"", ""UTC"", ""-500"", ""-5:00"", ""EDT"", ""US/Eastern"", ""America/New York"" are all valid replies. Return "None" if a string name isn’t known. Note that this is a method rather than a fixed string primarily because some "tzinfo" subclasses will wish to return different names depending on the specific value of *dt* passed, especially if the "tzinfo" class is accounting for daylight time. The default implementation of "tzname()" raises "NotImplementedError". These methods are called by a "datetime" or "time" object, in response to their methods of the same names. A "datetime" object passes itself as the argument, and a "time" object passes "None" as the argument. A "tzinfo" subclass’s methods should therefore be prepared to accept a *dt* argument of "None", or of class "datetime". When "None" is passed, it’s up to the class designer to decide the best response. For example, returning "None" is appropriate if the class wishes to say that time objects don’t participate in the "tzinfo" protocols. It may be more useful for "utcoffset(None)" to return the standard UTC offset, as there is no other convention for discovering the standard offset. When a "datetime" object is passed in response to a "datetime" method, "dt.tzinfo" is the same object as *self*. "tzinfo" methods can rely on this, unless user code calls "tzinfo" methods directly. The intent is that the "tzinfo" methods interpret *dt* as being in local time, and not need worry about objects in other time zones. There is one more "tzinfo" method that a subclass may wish to override: tzinfo.fromutc(dt) This is called from the default "datetime.astimezone()" implementation. When called from that, "dt.tzinfo" is *self*, and *dt*’s date and time data are to be viewed as expressing a UTC time. The purpose of "fromutc()" is to adjust the date and time data, returning an equivalent datetime in *self*’s local time. Most "tzinfo" subclasses should be able to inherit the default "fromutc()" implementation without problems. It’s strong enough to handle fixed-offset time zones, and time zones accounting for both standard and daylight time, and the latter even if the DST transition times differ in different years. An example of a time zone the default "fromutc()" implementation may not handle correctly in all cases is one where the standard offset (from UTC) depends on the specific date and time passed, which can happen for political reasons. The default implementations of "astimezone()" and "fromutc()" may not produce the result you want if the result is one of the hours straddling the moment the standard offset changes. Skipping code for error cases, the default "fromutc()" implementation acts like: def fromutc(self, dt): # raise ValueError error if dt.tzinfo is not self dtoff = dt.utcoffset() dtdst = dt.dst() # raise ValueError if dtoff is None or dtdst is None delta = dtoff - dtdst # this is self's standard offset if delta: dt += delta # convert to standard local time dtdst = dt.dst() # raise ValueError if dtdst is None if dtdst: return dt + dtdst else: return dt In the following "tzinfo_examples.py" file there are some examples of "tzinfo" classes: from datetime import tzinfo, timedelta, datetime ZERO = timedelta(0) HOUR = timedelta(hours=1) SECOND = timedelta(seconds=1) # A class capturing the platform's idea of local time. # (May result in wrong values on historical times in # timezones where UTC offset and/or the DST rules had # changed in the past.) import time as _time STDOFFSET = timedelta(seconds = -_time.timezone) if _time.daylight: DSTOFFSET = timedelta(seconds = -_time.altzone) else: DSTOFFSET = STDOFFSET DSTDIFF = DSTOFFSET - STDOFFSET class LocalTimezone(tzinfo): def fromutc(self, dt): assert dt.tzinfo is self stamp = (dt - datetime(1970, 1, 1, tzinfo=self)) // SECOND args = _time.localtime(stamp)[:6] dst_diff = DSTDIFF // SECOND # Detect fold fold = (args == _time.localtime(stamp - dst_diff)) return datetime(*args, microsecond=dt.microsecond, tzinfo=self, fold=fold) def utcoffset(self, dt): if self._isdst(dt): return DSTOFFSET else: return STDOFFSET def dst(self, dt): if self._isdst(dt): return DSTDIFF else: return ZERO def tzname(self, dt): return _time.tzname[self._isdst(dt)] def _isdst(self, dt): tt = (dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.weekday(), 0, 0) stamp = _time.mktime(tt) tt = _time.localtime(stamp) return tt.tm_isdst > 0 Local = LocalTimezone() # A complete implementation of current DST rules for major US time zones. def first_sunday_on_or_after(dt): days_to_go = 6 - dt.weekday() if days_to_go: dt += timedelta(days_to_go) return dt # US DST Rules # # This is a simplified (i.e., wrong for a few cases) set of rules for US # DST start and end times. For a complete and up-to-date set of DST rules # and timezone definitions, visit the Olson Database (or try pytz): # http://www.twinsun.com/tz/tz-link.htm # https://sourceforge.net/projects/pytz/ (might not be up-to-date) # # In the US, since 2007, DST starts at 2am (standard time) on the second # Sunday in March, which is the first Sunday on or after Mar 8. DSTSTART_2007 = datetime(1, 3, 8, 2) # and ends at 2am (DST time) on the first Sunday of Nov. DSTEND_2007 = datetime(1, 11, 1, 2) # From 1987 to 2006, DST used to start at 2am (standard time) on the first # Sunday in April and to end at 2am (DST time) on the last # Sunday of October, which is the first Sunday on or after Oct 25. DSTSTART_1987_2006 = datetime(1, 4, 1, 2) DSTEND_1987_2006 = datetime(1, 10, 25, 2) # From 1967 to 1986, DST used to start at 2am (standard time) on the last # Sunday in April (the one on or after April 24) and to end at 2am (DST time) # on the last Sunday of October, which is the first Sunday # on or after Oct 25. DSTSTART_1967_1986 = datetime(1, 4, 24, 2) DSTEND_1967_1986 = DSTEND_1987_2006 def us_dst_range(year): # Find start and end times for US DST. For years before 1967, return # start = end for no DST. if 2006 < year: dststart, dstend = DSTSTART_2007, DSTEND_2007 elif 1986 < year < 2007: dststart, dstend = DSTSTART_1987_2006, DSTEND_1987_2006 elif 1966 < year < 1987: dststart, dstend = DSTSTART_1967_1986, DSTEND_1967_1986 else: return (datetime(year, 1, 1), ) * 2 start = first_sunday_on_or_after(dststart.replace(year=year)) end = first_sunday_on_or_after(dstend.replace(year=year)) return start, end class USTimeZone(tzinfo): def __init__(self, hours, reprname, stdname, dstname): self.stdoffset = timedelta(hours=hours) self.reprname = reprname self.stdname = stdname self.dstname = dstname def __repr__(self): return self.reprname def tzname(self, dt): if self.dst(dt): return self.dstname else: return self.stdname def utcoffset(self, dt): return self.stdoffset + self.dst(dt) def dst(self, dt): if dt is None or dt.tzinfo is None: # An exception may be sensible here, in one or both cases. # It depends on how you want to treat them. The default # fromutc() implementation (called by the default astimezone() # implementation) passes a datetime with dt.tzinfo is self. return ZERO assert dt.tzinfo is self start, end = us_dst_range(dt.year) # Can't compare naive to aware objects, so strip the timezone from # dt first. dt = dt.replace(tzinfo=None) if start + HOUR <= dt < end - HOUR: # DST is in effect. return HOUR if end - HOUR <= dt < end: # Fold (an ambiguous hour): use dt.fold to disambiguate. return ZERO if dt.fold else HOUR if start <= dt < start + HOUR: # Gap (a non-existent hour): reverse the fold rule. return HOUR if dt.fold else ZERO # DST is off. return ZERO def fromutc(self, dt): assert dt.tzinfo is self start, end = us_dst_range(dt.year) start = start.replace(tzinfo=self) end = end.replace(tzinfo=self) std_time = dt + self.stdoffset dst_time = std_time + HOUR if end <= dst_time < end + HOUR: # Repeated hour return std_time.replace(fold=1) if std_time < start or dst_time >= end: # Standard time return std_time if start <= std_time < end - HOUR: # Daylight saving time return dst_time Eastern = USTimeZone(-5, "Eastern", "EST", "EDT") Central = USTimeZone(-6, "Central", "CST", "CDT") Mountain = USTimeZone(-7, "Mountain", "MST", "MDT") Pacific = USTimeZone(-8, "Pacific", "PST", "PDT") Note that there are unavoidable subtleties twice per year in a "tzinfo" subclass accounting for both standard and daylight time, at the DST transition points. For concreteness, consider US Eastern (UTC -0500), where EDT begins the minute after 1:59 (EST) on the second Sunday in March, and ends the minute after 1:59 (EDT) on the first Sunday in November: UTC 3:MM 4:MM 5:MM 6:MM 7:MM 8:MM EST 22:MM 23:MM 0:MM 1:MM 2:MM 3:MM EDT 23:MM 0:MM 1:MM 2:MM 3:MM 4:MM start 22:MM 23:MM 0:MM 1:MM 3:MM 4:MM end 23:MM 0:MM 1:MM 1:MM 2:MM 3:MM When DST starts (the “start” line), the local wall clock leaps from 1:59 to 3:00. A wall time of the form 2:MM doesn’t really make sense on that day, so "astimezone(Eastern)" won’t deliver a result with "hour == 2" on the day DST begins. For example, at the Spring forward transition of 2016, we get: >>> from datetime import datetime, timezone >>> from tzinfo_examples import HOUR, Eastern >>> u0 = datetime(2016, 3, 13, 5, tzinfo=timezone.utc) >>> for i in range(4): ... u = u0 + i*HOUR ... t = u.astimezone(Eastern) ... print(u.time(), 'UTC =', t.time(), t.tzname()) ... 05:00:00 UTC = 00:00:00 EST 06:00:00 UTC = 01:00:00 EST 07:00:00 UTC = 03:00:00 EDT 08:00:00 UTC = 04:00:00 EDT When DST ends (the “end” line), there’s a potentially worse problem: there’s an hour that can’t be spelled unambiguously in local wall time: the last hour of daylight time. In Eastern, that’s times of the form 5:MM UTC on the day daylight time ends. The local wall clock leaps from 1:59 (daylight time) back to 1:00 (standard time) again. Local times of the form 1:MM are ambiguous. "astimezone()" mimics the local clock’s behavior by mapping two adjacent UTC hours into the same local hour then. In the Eastern example, UTC times of the form 5:MM and 6:MM both map to 1:MM when converted to Eastern, but earlier times have the "fold" attribute set to 0 and the later times have it set to 1. For example, at the Fall back transition of 2016, we get: >>> u0 = datetime(2016, 11, 6, 4, tzinfo=timezone.utc) >>> for i in range(4): ... u = u0 + i*HOUR ... t = u.astimezone(Eastern) ... print(u.time(), 'UTC =', t.time(), t.tzname(), t.fold) ... 04:00:00 UTC = 00:00:00 EDT 0 05:00:00 UTC = 01:00:00 EDT 0 06:00:00 UTC = 01:00:00 EST 1 07:00:00 UTC = 02:00:00 EST 0 Note that the "datetime" instances that differ only by the value of the "fold" attribute are considered equal in comparisons. Applications that can’t bear wall-time ambiguities should explicitly check the value of the "fold" attribute or avoid using hybrid "tzinfo" subclasses; there are no ambiguities when using "timezone", or any other fixed-offset "tzinfo" subclass (such as a class representing only EST (fixed offset -5 hours), or only EDT (fixed offset -4 hours)). See also: "zoneinfo" The "datetime" module has a basic "timezone" class (for handling arbitrary fixed offsets from UTC) and its "timezone.utc" attribute (a UTC "timezone" instance). "zoneinfo" brings the *IANA time zone database* (also known as the Olson database) to Python, and its usage is recommended. IANA time zone database The Time Zone Database (often called tz, tzdata or zoneinfo) contains code and data that represent the history of local time for many representative locations around the globe. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight-saving rules. "timezone" Objects ================== The "timezone" class is a subclass of "tzinfo", each instance of which represents a time zone defined by a fixed offset from UTC. Objects of this class cannot be used to represent time zone information in the locations where different offsets are used in different days of the year or where historical changes have been made to civil time. class datetime.timezone(offset, name=None) The *offset* argument must be specified as a "timedelta" object representing the difference between the local time and UTC. It must be strictly between "-timedelta(hours=24)" and "timedelta(hours=24)", otherwise "ValueError" is raised. The *name* argument is optional. If specified it must be a string that will be used as the value returned by the "datetime.tzname()" method. Added in version 3.2. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. timezone.utcoffset(dt) Return the fixed value specified when the "timezone" instance is constructed. The *dt* argument is ignored. The return value is a "timedelta" instance equal to the difference between the local time and UTC. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. timezone.tzname(dt) Return the fixed value specified when the "timezone" instance is constructed. If *name* is not provided in the constructor, the name returned by "tzname(dt)" is generated from the value of the "offset" as follows. If *offset* is "timedelta(0)", the name is “UTC”, otherwise it is a string in the format "UTC±HH:MM", where ± is the sign of "offset", HH and MM are two digits of "offset.hours" and "offset.minutes" respectively. Changed in version 3.6: Name generated from "offset=timedelta(0)" is now plain "'UTC'", not "'UTC+00:00'". timezone.dst(dt) Always returns "None". timezone.fromutc(dt) Return "dt + offset". The *dt* argument must be an aware "datetime" instance, with "tzinfo" set to "self". Class attributes: timezone.utc The UTC time zone, "timezone(timedelta(0))". "strftime()" and "strptime()" Behavior ====================================== "date", "datetime", and "time" objects all support a "strftime(format)" method, to create a string representing the time under the control of an explicit format string. Conversely, the "datetime.strptime()" class method creates a "datetime" object from a string representing a date and time and a corresponding format string. The table below provides a high-level comparison of "strftime()" versus "strptime()": +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | | "strftime" | "strptime" | |==================|==========================================================|================================================================================| | Usage | Convert object to a string according to a given format | Parse a string into a "datetime" object given a corresponding format | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | Type of method | Instance method | Class method | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | Method of | "date"; "datetime"; "time" | "datetime" | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | Signature | "strftime(format)" | "strptime(date_string, format)" | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ "strftime()" and "strptime()" Format Codes ------------------------------------------ These methods accept format codes that can be used to parse and format dates: >>> datetime.strptime('31/01/22 23:59:59.999999', ... '%d/%m/%y %H:%M:%S.%f') datetime.datetime(2022, 1, 31, 23, 59, 59, 999999) >>> _.strftime('%a %d %b %Y, %I:%M%p') 'Mon 31 Jan 2022, 11:59PM' The following is a list of all the format codes that the 1989 C standard requires, and these work on all platforms with a standard C implementation. +-------------+----------------------------------+--------------------------+---------+ | Directive | Meaning | Example | Notes | |=============|==================================|==========================|=========| | "%a" | Weekday as locale’s abbreviated | Sun, Mon, …, Sat | (1) | | | name. | (en_US); So, Mo, …, Sa | | | | | (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%A" | Weekday as locale’s full name. | Sunday, Monday, …, | (1) | | | | Saturday (en_US); | | | | | Sonntag, Montag, …, | | | | | Samstag (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%w" | Weekday as a decimal number, | 0, 1, …, 6 | | | | where 0 is Sunday and 6 is | | | | | Saturday. | | | +-------------+----------------------------------+--------------------------+---------+ | "%d" | Day of the month as a zero- | 01, 02, …, 31 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%b" | Month as locale’s abbreviated | Jan, Feb, …, Dec | (1) | | | name. | (en_US); Jan, Feb, …, | | | | | Dez (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%B" | Month as locale’s full name. | January, February, …, | (1) | | | | December (en_US); | | | | | Januar, Februar, …, | | | | | Dezember (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%m" | Month as a zero-padded decimal | 01, 02, …, 12 | (9) | | | number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%y" | Year without century as a zero- | 00, 01, …, 99 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%Y" | Year with century as a decimal | 0001, 0002, …, 2013, | (2) | | | number. | 2014, …, 9998, 9999 | | +-------------+----------------------------------+--------------------------+---------+ | "%H" | Hour (24-hour clock) as a zero- | 00, 01, …, 23 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%I" | Hour (12-hour clock) as a zero- | 01, 02, …, 12 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%p" | Locale’s equivalent of either AM | AM, PM (en_US); am, pm | (1), | | | or PM. | (de_DE) | (3) | +-------------+----------------------------------+--------------------------+---------+ | "%M" | Minute as a zero-padded decimal | 00, 01, …, 59 | (9) | | | number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%S" | Second as a zero-padded decimal | 00, 01, …, 59 | (4), | | | number. | | (9) | +-------------+----------------------------------+--------------------------+---------+ | "%f" | Microsecond as a decimal number, | 000000, 000001, …, | (5) | | | zero-padded to 6 digits. | 999999 | | +-------------+----------------------------------+--------------------------+---------+ | "%z" | UTC offset in the form | (empty), +0000, -0400, | (6) | | | "±HHMM[SS[.ffffff]]" (empty | +1030, +063415, | | | | string if the object is naive). | -030712.345216 | | +-------------+----------------------------------+--------------------------+---------+ | "%Z" | Time zone name (empty string if | (empty), UTC, GMT | (6) | | | the object is naive). | | | +-------------+----------------------------------+--------------------------+---------+ | "%j" | Day of the year as a zero-padded | 001, 002, …, 366 | (9) | | | decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%U" | Week number of the year (Sunday | 00, 01, …, 53 | (7), | | | as the first day of the week) as | | (9) | | | a zero-padded decimal number. | | | | | All days in a new year preceding | | | | | the first Sunday are considered | | | | | to be in week 0. | | | +-------------+----------------------------------+--------------------------+---------+ | "%W" | Week number of the year (Monday | 00, 01, …, 53 | (7), | | | as the first day of the week) as | | (9) | | | a zero-padded decimal number. | | | | | All days in a new year preceding | | | | | the first Monday are considered | | | | | to be in week 0. | | | +-------------+----------------------------------+--------------------------+---------+ | "%c" | Locale’s appropriate date and | Tue Aug 16 21:30:00 1988 | (1) | | | time representation. | (en_US); Di 16 Aug | | | | | 21:30:00 1988 (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%x" | Locale’s appropriate date | 08/16/88 (None); | (1) | | | representation. | 08/16/1988 (en_US); | | | | | 16.08.1988 (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%X" | Locale’s appropriate time | 21:30:00 (en_US); | (1) | | | representation. | 21:30:00 (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%%" | A literal "'%'" character. | % | | +-------------+----------------------------------+--------------------------+---------+ Several additional directives not required by the C89 standard are included for convenience. These parameters all correspond to ISO 8601 date values. +-------------+----------------------------------+--------------------------+---------+ | Directive | Meaning | Example | Notes | |=============|==================================|==========================|=========| | "%G" | ISO 8601 year with century | 0001, 0002, …, 2013, | (8) | | | representing the year that | 2014, …, 9998, 9999 | | | | contains the greater part of the | | | | | ISO week ("%V"). | | | +-------------+----------------------------------+--------------------------+---------+ | "%u" | ISO 8601 weekday as a decimal | 1, 2, …, 7 | | | | number where 1 is Monday. | | | +-------------+----------------------------------+--------------------------+---------+ | "%V" | ISO 8601 week as a decimal | 01, 02, …, 53 | (8), | | | number with Monday as the first | | (9) | | | day of the week. Week 01 is the | | | | | week containing Jan 4. | | | +-------------+----------------------------------+--------------------------+---------+ | "%:z" | UTC offset in the form | (empty), +00:00, -04:00, | (6) | | | "±HH:MM[:SS[.ffffff]]" (empty | +10:30, +06:34:15, | | | | string if the object is naive). | -03:07:12.345216 | | +-------------+----------------------------------+--------------------------+---------+ These may not be available on all platforms when used with the "strftime()" method. The ISO 8601 year and ISO 8601 week directives are not interchangeable with the year and week number directives above. Calling "strptime()" with incomplete or ambiguous ISO 8601 directives will raise a "ValueError". The full set of format codes supported varies across platforms, because Python calls the platform C library’s "strftime()" function, and platform variations are common. To see the full set of format codes supported on your platform, consult the *strftime(3)* documentation. There are also differences between platforms in handling of unsupported format specifiers. Added in version 3.6: "%G", "%u" and "%V" were added. Added in version 3.12: "%:z" was added. Technical Detail ---------------- Broadly speaking, "d.strftime(fmt)" acts like the "time" module’s "time.strftime(fmt, d.timetuple())" although not all objects support a "timetuple()" method. For the "datetime.strptime()" class method, the default value is "1900-01-01T00:00:00.000": any components not specified in the format string will be pulled from the default value. [4] Using "datetime.strptime(date_string, format)" is equivalent to: datetime(*(time.strptime(date_string, format)[0:6])) except when the format includes sub-second components or time zone offset information, which are supported in "datetime.strptime" but are discarded by "time.strptime". For "time" objects, the format codes for year, month, and day should not be used, as "time" objects have no such values. If they’re used anyway, 1900 is substituted for the year, and 1 for the month and day. For "date" objects, the format codes for hours, minutes, seconds, and microseconds should not be used, as "date" objects have no such values. If they’re used anyway, 0 is substituted for them. For the same reason, handling of format strings containing Unicode code points that can’t be represented in the charset of the current locale is also platform-dependent. On some platforms such code points are preserved intact in the output, while on others "strftime" may raise "UnicodeError" or return an empty string instead. Notes: 1. Because the format depends on the current locale, care should be taken when making assumptions about the output value. Field orderings will vary (for example, “month/day/year” versus “day/month/year”), and the output may contain non-ASCII characters. 2. The "strptime()" method can parse years in the full [1, 9999] range, but years < 1000 must be zero-filled to 4-digit width. Changed in version 3.2: In previous versions, "strftime()" method was restricted to years >= 1900. Changed in version 3.3: In version 3.2, "strftime()" method was restricted to years >= 1000. 3. When used with the "strptime()" method, the "%p" directive only affects the output hour field if the "%I" directive is used to parse the hour. 4. Unlike the "time" module, the "datetime" module does not support leap seconds. 5. When used with the "strptime()" method, the "%f" directive accepts from one to six digits and zero pads on the right. "%f" is an extension to the set of format characters in the C standard (but implemented separately in datetime objects, and therefore always available). 6. For a naive object, the "%z", "%:z" and "%Z" format codes are replaced by empty strings. For an aware object: "%z" "utcoffset()" is transformed into a string of the form "±HHMM[SS[.ffffff]]", where "HH" is a 2-digit string giving the number of UTC offset hours, "MM" is a 2-digit string giving the number of UTC offset minutes, "SS" is a 2-digit string giving the number of UTC offset seconds and "ffffff" is a 6-digit string giving the number of UTC offset microseconds. The "ffffff" part is omitted when the offset is a whole number of seconds and both the "ffffff" and the "SS" part is omitted when the offset is a whole number of minutes. For example, if "utcoffset()" returns "timedelta(hours=-3, minutes=-30)", "%z" is replaced with the string "'-0330'". Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. Changed in version 3.7: When the "%z" directive is provided to the "strptime()" method, the UTC offsets can have a colon as a separator between hours, minutes and seconds. For example, "'+01:00:00'" will be parsed as an offset of one hour. In addition, providing "'Z'" is identical to "'+00:00'". "%:z" Behaves exactly as "%z", but has a colon separator added between hours, minutes and seconds. "%Z" In "strftime()", "%Z" is replaced by an empty string if "tzname()" returns "None"; otherwise "%Z" is replaced by the returned value, which must be a string. "strptime()" only accepts certain values for "%Z": 1. any value in "time.tzname" for your machine’s locale 2. the hard-coded values "UTC" and "GMT" So someone living in Japan may have "JST", "UTC", and "GMT" as valid values, but probably not "EST". It will raise "ValueError" for invalid values. Changed in version 3.2: When the "%z" directive is provided to the "strptime()" method, an aware "datetime" object will be produced. The "tzinfo" of the result will be set to a "timezone" instance. 7. When used with the "strptime()" method, "%U" and "%W" are only used in calculations when the day of the week and the calendar year ("%Y") are specified. 8. Similar to "%U" and "%W", "%V" is only used in calculations when the day of the week and the ISO year ("%G") are specified in a "strptime()" format string. Also note that "%G" and "%Y" are not interchangeable. 9. When used with the "strptime()" method, the leading zero is optional for formats "%d", "%m", "%H", "%I", "%M", "%S", "%j", "%U", "%W", and "%V". Format "%y" does require a leading zero. 10. When parsing a month and day using "strptime()", always include a year in the format. If the value you need to parse lacks a year, append an explicit dummy leap year. Otherwise your code will raise an exception when it encounters leap day because the default year used by the parser is not a leap year. Users run into this bug every four years… >>> month_day = "02/29" >>> datetime.strptime(f"{month_day};1984", "%m/%d;%Y") # No leap year bug. datetime.datetime(1984, 2, 29, 0, 0) Deprecated since version 3.13, will be removed in version 3.15: "strptime()" calls using a format string containing a day of month without a year now emit a "DeprecationWarning". In 3.15 or later we may change this into an error or change the default year to a leap year. See gh-70647. -[ Footnotes ]- [1] If, that is, we ignore the effects of Relativity [2] This matches the definition of the “proleptic Gregorian” calendar in Dershowitz and Reingold’s book *Calendrical Calculations*, where it’s the base calendar for all computations. See the book for algorithms for converting between proleptic Gregorian ordinals and many other calendar systems. [3] See R. H. van Gent’s guide to the mathematics of the ISO 8601 calendar for a good explanation. [4] Passing "datetime.strptime('Feb 29', '%b %d')" will fail since 1900 is not a leap year. "dbm" — Interfaces to Unix “databases” ************************************** **Source code:** Lib/dbm/__init__.py ====================================================================== "dbm" is a generic interface to variants of the DBM database: * "dbm.sqlite3" * "dbm.gnu" * "dbm.ndbm" If none of these modules are installed, the slow-but-simple implementation in module "dbm.dumb" will be used. There is a third party interface to the Oracle Berkeley DB. exception dbm.error A tuple containing the exceptions that can be raised by each of the supported modules, with a unique exception also named "dbm.error" as the first item — the latter is used when "dbm.error" is raised. dbm.whichdb(filename) This function attempts to guess which of the several simple database modules available — "dbm.sqlite3", "dbm.gnu", "dbm.ndbm", or "dbm.dumb" — should be used to open a given file. Return one of the following values: * "None" if the file can’t be opened because it’s unreadable or doesn’t exist * the empty string ("''") if the file’s format can’t be guessed * a string containing the required module name, such as "'dbm.ndbm'" or "'dbm.gnu'" Changed in version 3.11: *filename* accepts a *path-like object*. dbm.open(file, flag='r', mode=0o666) Open a database and return the corresponding database object. Parameters: * **file** (*path-like object*) – The database file to open. If the database file already exists, the "whichdb()" function is used to determine its type and the appropriate module is used; if it does not exist, the first submodule listed above that can be imported is used. * **flag** (*str*) – * "'r'" (default): Open existing database for reading only. * "'w'": Open existing database for reading and writing. * "'c'": Open database for reading and writing, creating it if it doesn’t exist. * "'n'": Always create a new, empty database, open for reading and writing. * **mode** (*int*) – The Unix file access mode of the file (default: octal "0o666"), used only when the database has to be created. Changed in version 3.11: *file* accepts a *path-like object*. The object returned by "open()" supports the same basic functionality as a "dict"; keys and their corresponding values can be stored, retrieved, and deleted, and the "in" operator and the "keys()" method are available, as well as "get()" and "setdefault()" methods. Key and values are always stored as "bytes". This means that when strings are used they are implicitly converted to the default encoding before being stored. These objects also support being used in a "with" statement, which will automatically close them when done. Changed in version 3.2: "get()" and "setdefault()" methods are now available for all "dbm" backends. Changed in version 3.4: Added native support for the context management protocol to the objects returned by "open()". Changed in version 3.8: Deleting a key from a read-only database raises a database module specific exception instead of "KeyError". The following example records some hostnames and a corresponding title, and then prints out the contents of the database: import dbm # Open database, creating it if necessary. with dbm.open('cache', 'c') as db: # Record some values db[b'hello'] = b'there' db['www.python.org'] = 'Python Website' db['www.cnn.com'] = 'Cable News Network' # Note that the keys are considered bytes now. assert db[b'www.python.org'] == b'Python Website' # Notice how the value is now in bytes. assert db['www.cnn.com'] == b'Cable News Network' # Often-used methods of the dict interface work too. print(db.get('python.org', b'not present')) # Storing a non-string key or value will raise an exception (most # likely a TypeError). db['www.yahoo.com'] = 4 # db is automatically closed when leaving the with statement. See also: Module "shelve" Persistence module which stores non-string data. The individual submodules are described in the following sections. "dbm.sqlite3" — SQLite backend for dbm ====================================== Added in version 3.13. **Source code:** Lib/dbm/sqlite3.py ====================================================================== This module uses the standard library "sqlite3" module to provide an SQLite backend for the "dbm" module. The files created by "dbm.sqlite3" can thus be opened by "sqlite3", or any other SQLite browser, including the SQLite CLI. Availability: not WASI. This module does not work or is not available on WebAssembly. See WebAssembly platforms for more information. dbm.sqlite3.open(filename, /, flag='r', mode=0o666) Open an SQLite database. The returned object behaves like a *mapping*, implements a "close()" method, and supports a “closing” context manager via the "with" keyword. Parameters: * **filename** (*path-like object*) – The path to the database to be opened. * **flag** (*str*) – * "'r'" (default): Open existing database for reading only. * "'w'": Open existing database for reading and writing. * "'c'": Open database for reading and writing, creating it if it doesn’t exist. * "'n'": Always create a new, empty database, open for reading and writing. * **mode** – The Unix file access mode of the file (default: octal "0o666"), used only when the database has to be created. "dbm.gnu" — GNU database manager ================================ **Source code:** Lib/dbm/gnu.py ====================================================================== The "dbm.gnu" module provides an interface to the GDBM (GNU dbm) library, similar to the "dbm.ndbm" module, but with additional functionality like crash tolerance. Note: The file formats created by "dbm.gnu" and "dbm.ndbm" are incompatible and can not be used interchangeably. Availability: not Android, not iOS, not WASI. This module is not supported on mobile platforms or WebAssembly platforms. exception dbm.gnu.error Raised on "dbm.gnu"-specific errors, such as I/O errors. "KeyError" is raised for general mapping errors like specifying an incorrect key. dbm.gnu.open(filename, flag='r', mode=0o666, /) Open a GDBM database and return a "gdbm" object. Parameters: * **filename** (*path-like object*) – The database file to open. * **flag** (*str*) – * "'r'" (default): Open existing database for reading only. * "'w'": Open existing database for reading and writing. * "'c'": Open database for reading and writing, creating it if it doesn’t exist. * "'n'": Always create a new, empty database, open for reading and writing. The following additional characters may be appended to control how the database is opened: * "'f'": Open the database in fast mode. Writes to the database will not be synchronized. * "'s'": Synchronized mode. Changes to the database will be written immediately to the file. * "'u'": Do not lock database. Not all flags are valid for all versions of GDBM. See the "open_flags" member for a list of supported flag characters. * **mode** (*int*) – The Unix file access mode of the file (default: octal "0o666"), used only when the database has to be created. Raises: **error** – If an invalid *flag* argument is passed. Changed in version 3.11: *filename* accepts a *path-like object*. dbm.gnu.open_flags A string of characters the *flag* parameter of "open()" supports. "gdbm" objects behave similar to *mappings*, but "items()" and "values()" methods are not supported. The following methods are also provided: gdbm.firstkey() It’s possible to loop over every key in the database using this method and the "nextkey()" method. The traversal is ordered by GDBM’s internal hash values, and won’t be sorted by the key values. This method returns the starting key. gdbm.nextkey(key) Returns the key that follows *key* in the traversal. The following code prints every key in the database "db", without having to create a list in memory that contains them all: k = db.firstkey() while k is not None: print(k) k = db.nextkey(k) gdbm.reorganize() If you have carried out a lot of deletions and would like to shrink the space used by the GDBM file, this routine will reorganize the database. "gdbm" objects will not shorten the length of a database file except by using this reorganization; otherwise, deleted file space will be kept and reused as new (key, value) pairs are added. gdbm.sync() When the database has been opened in fast mode, this method forces any unwritten data to be written to the disk. gdbm.close() Close the GDBM database. gdbm.clear() Remove all items from the GDBM database. Added in version 3.13. "dbm.ndbm" — New Database Manager ================================= **Source code:** Lib/dbm/ndbm.py ====================================================================== The "dbm.ndbm" module provides an interface to the NDBM (New Database Manager) library. This module can be used with the “classic” NDBM interface or the GDBM (GNU dbm) compatibility interface. Note: The file formats created by "dbm.gnu" and "dbm.ndbm" are incompatible and can not be used interchangeably. Warning: The NDBM library shipped as part of macOS has an undocumented limitation on the size of values, which can result in corrupted database files when storing values larger than this limit. Reading such corrupted files can result in a hard crash (segmentation fault). Availability: not Android, not iOS, not WASI. This module is not supported on mobile platforms or WebAssembly platforms. exception dbm.ndbm.error Raised on "dbm.ndbm"-specific errors, such as I/O errors. "KeyError" is raised for general mapping errors like specifying an incorrect key. dbm.ndbm.library Name of the NDBM implementation library used. dbm.ndbm.open(filename, flag='r', mode=0o666, /) Open an NDBM database and return an "ndbm" object. Parameters: * **filename** (*path-like object*) – The basename of the database file (without the ".dir" or ".pag" extensions). * **flag** (*str*) – * "'r'" (default): Open existing database for reading only. * "'w'": Open existing database for reading and writing. * "'c'": Open database for reading and writing, creating it if it doesn’t exist. * "'n'": Always create a new, empty database, open for reading and writing. * **mode** (*int*) – The Unix file access mode of the file (default: octal "0o666"), used only when the database has to be created. "ndbm" objects behave similar to *mappings*, but "items()" and "values()" methods are not supported. The following methods are also provided: Changed in version 3.11: Accepts *path-like object* for filename. ndbm.close() Close the NDBM database. ndbm.clear() Remove all items from the NDBM database. Added in version 3.13. "dbm.dumb" — Portable DBM implementation ======================================== **Source code:** Lib/dbm/dumb.py Note: The "dbm.dumb" module is intended as a last resort fallback for the "dbm" module when a more robust module is not available. The "dbm.dumb" module is not written for speed and is not nearly as heavily used as the other database modules. ====================================================================== The "dbm.dumb" module provides a persistent "dict"-like interface which is written entirely in Python. Unlike other "dbm" backends, such as "dbm.gnu", no external library is required. The "dbm.dumb" module defines the following: exception dbm.dumb.error Raised on "dbm.dumb"-specific errors, such as I/O errors. "KeyError" is raised for general mapping errors like specifying an incorrect key. dbm.dumb.open(filename, flag='c', mode=0o666) Open a "dbm.dumb" database. The returned database object behaves similar to a *mapping*, in addition to providing "sync()" and "close()" methods. Parameters: * **filename** – The basename of the database file (without extensions). A new database creates the following files: * "*filename*.dat" * "*filename*.dir" * **flag** (*str*) – * "'r'": Open existing database for reading only. * "'w'": Open existing database for reading and writing. * "'c'" (default): Open database for reading and writing, creating it if it doesn’t exist. * "'n'": Always create a new, empty database, open for reading and writing. * **mode** (*int*) – The Unix file access mode of the file (default: octal "0o666"), used only when the database has to be created. Warning: It is possible to crash the Python interpreter when loading a database with a sufficiently large/complex entry due to stack depth limitations in Python’s AST compiler. Changed in version 3.5: "open()" always creates a new database when *flag* is "'n'". Changed in version 3.8: A database opened read-only if *flag* is "'r'". A database is not created if it does not exist if *flag* is "'r'" or "'w'". Changed in version 3.11: *filename* accepts a *path-like object*. In addition to the methods provided by the "collections.abc.MutableMapping" class, the following methods are provided: dumbdbm.sync() Synchronize the on-disk directory and data files. This method is called by the "shelve.Shelf.sync()" method. dumbdbm.close() Close the database. Debugging and Profiling *********************** These libraries help you with Python development: the debugger enables you to step through code, analyze stack frames and set breakpoints etc., and the profilers run code and give you a detailed breakdown of execution times, allowing you to identify bottlenecks in your programs. Auditing events provide visibility into runtime behaviors that would otherwise require intrusive debugging or patching. * Audit events table * "bdb" — Debugger framework * "faulthandler" — Dump the Python traceback * Dumping the traceback * Fault handler state * Dumping the tracebacks after a timeout * Dumping the traceback on a user signal * Issue with file descriptors * Example * "pdb" — The Python Debugger * Debugger Commands * The Python Profilers * Introduction to the profilers * Instant User’s Manual * "profile" and "cProfile" Module Reference * The "Stats" Class * What Is Deterministic Profiling? * Limitations * Calibration * Using a custom timer * "timeit" — Measure execution time of small code snippets * Basic Examples * Python Interface * Command-Line Interface * Examples * "trace" — Trace or track Python statement execution * Command-Line Usage * Main options * Modifiers * Filters * Programmatic Interface * "tracemalloc" — Trace memory allocations * Examples * Display the top 10 * Compute differences * Get the traceback of a memory block * Pretty top * Record the current and peak size of all traced memory blocks * API * Functions * DomainFilter * Filter * Frame * Snapshot * Statistic * StatisticDiff * Trace * Traceback "decimal" — Decimal fixed-point and floating-point arithmetic ************************************************************* **Source code:** Lib/decimal.py ====================================================================== The "decimal" module provides support for fast correctly rounded decimal floating-point arithmetic. It offers several advantages over the "float" datatype: * Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.” – excerpt from the decimal arithmetic specification. * Decimal numbers can be represented exactly. In contrast, numbers like "1.1" and "2.2" do not have exact representations in binary floating point. End users typically would not expect "1.1 + 2.2" to display as "3.3000000000000003" as it does with binary floating point. * The exactness carries over into arithmetic. In decimal floating point, "0.1 + 0.1 + 0.1 - 0.3" is exactly equal to zero. In binary floating point, the result is "5.5511151231257827e-017". While near to zero, the differences prevent reliable equality testing and differences can accumulate. For this reason, decimal is preferred in accounting applications which have strict equality invariants. * The decimal module incorporates a notion of significant places so that "1.30 + 1.20" is "2.50". The trailing zero is kept to indicate significance. This is the customary presentation for monetary applications. For multiplication, the “schoolbook” approach uses all the figures in the multiplicands. For instance, "1.3 * 1.2" gives "1.56" while "1.30 * 1.20" gives "1.5600". * Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem: >>> from decimal import * >>> getcontext().prec = 6 >>> Decimal(1) / Decimal(7) Decimal('0.142857') >>> getcontext().prec = 28 >>> Decimal(1) / Decimal(7) Decimal('0.1428571428571428571428571429') * Both binary and decimal floating point are implemented in terms of published standards. While the built-in float type exposes only a modest portion of its capabilities, the decimal module exposes all required parts of the standard. When needed, the programmer has full control over rounding and signal handling. This includes an option to enforce exact arithmetic by using exceptions to block any inexact operations. * The decimal module was designed to support “without prejudice, both exact unrounded decimal arithmetic (sometimes called fixed-point arithmetic) and rounded floating-point arithmetic.” – excerpt from the decimal arithmetic specification. The module design is centered around three concepts: the decimal number, the context for arithmetic, and signals. A decimal number is immutable. It has a sign, coefficient digits, and an exponent. To preserve significance, the coefficient digits do not truncate trailing zeros. Decimals also include special values such as "Infinity", "-Infinity", and "NaN". The standard also differentiates "-0" from "+0". The context for arithmetic is an environment specifying precision, rounding rules, limits on exponents, flags indicating the results of operations, and trap enablers which determine whether signals are treated as exceptions. Rounding options include "ROUND_CEILING", "ROUND_DOWN", "ROUND_FLOOR", "ROUND_HALF_DOWN", "ROUND_HALF_EVEN", "ROUND_HALF_UP", "ROUND_UP", and "ROUND_05UP". Signals are groups of exceptional conditions arising during the course of computation. Depending on the needs of the application, signals may be ignored, considered as informational, or treated as exceptions. The signals in the decimal module are: "Clamped", "InvalidOperation", "DivisionByZero", "Inexact", "Rounded", "Subnormal", "Overflow", "Underflow" and "FloatOperation". For each signal there is a flag and a trap enabler. When a signal is encountered, its flag is set to one, then, if the trap enabler is set to one, an exception is raised. Flags are sticky, so the user needs to reset them before monitoring a calculation. See also: * IBM’s General Decimal Arithmetic Specification, The General Decimal Arithmetic Specification. Quick-start tutorial ==================== The usual start to using decimals is importing the module, viewing the current context with "getcontext()" and, if necessary, setting new values for precision, rounding, or enabled traps: >>> from decimal import * >>> getcontext() Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[Overflow, DivisionByZero, InvalidOperation]) >>> getcontext().prec = 7 # Set a new precision Decimal instances can be constructed from integers, strings, floats, or tuples. Construction from an integer or a float performs an exact conversion of the value of that integer or float. Decimal numbers include special values such as "NaN" which stands for “Not a number”, positive and negative "Infinity", and "-0": >>> getcontext().prec = 28 >>> Decimal(10) Decimal('10') >>> Decimal('3.14') Decimal('3.14') >>> Decimal(3.14) Decimal('3.140000000000000124344978758017532527446746826171875') >>> Decimal((0, (3, 1, 4), -2)) Decimal('3.14') >>> Decimal(str(2.0 ** 0.5)) Decimal('1.4142135623730951') >>> Decimal(2) ** Decimal('0.5') Decimal('1.414213562373095048801688724') >>> Decimal('NaN') Decimal('NaN') >>> Decimal('-Infinity') Decimal('-Infinity') If the "FloatOperation" signal is trapped, accidental mixing of decimals and floats in constructors or ordering comparisons raises an exception: >>> c = getcontext() >>> c.traps[FloatOperation] = True >>> Decimal(3.14) Traceback (most recent call last): File "", line 1, in decimal.FloatOperation: [] >>> Decimal('3.5') < 3.7 Traceback (most recent call last): File "", line 1, in decimal.FloatOperation: [] >>> Decimal('3.5') == 3.5 True Added in version 3.3. The significance of a new Decimal is determined solely by the number of digits input. Context precision and rounding only come into play during arithmetic operations. >>> getcontext().prec = 6 >>> Decimal('3.0') Decimal('3.0') >>> Decimal('3.1415926535') Decimal('3.1415926535') >>> Decimal('3.1415926535') + Decimal('2.7182818285') Decimal('5.85987') >>> getcontext().rounding = ROUND_UP >>> Decimal('3.1415926535') + Decimal('2.7182818285') Decimal('5.85988') If the internal limits of the C version are exceeded, constructing a decimal raises "InvalidOperation": >>> Decimal("1e9999999999999999999") Traceback (most recent call last): File "", line 1, in decimal.InvalidOperation: [] Changed in version 3.3. Decimals interact well with much of the rest of Python. Here is a small decimal floating-point flying circus: >>> data = list(map(Decimal, '1.34 1.87 3.45 2.35 1.00 0.03 9.25'.split())) >>> max(data) Decimal('9.25') >>> min(data) Decimal('0.03') >>> sorted(data) [Decimal('0.03'), Decimal('1.00'), Decimal('1.34'), Decimal('1.87'), Decimal('2.35'), Decimal('3.45'), Decimal('9.25')] >>> sum(data) Decimal('19.29') >>> a,b,c = data[:3] >>> str(a) '1.34' >>> float(a) 1.34 >>> round(a, 1) Decimal('1.3') >>> int(a) 1 >>> a * 5 Decimal('6.70') >>> a * b Decimal('2.5058') >>> c % a Decimal('0.77') And some mathematical functions are also available to Decimal: >>> getcontext().prec = 28 >>> Decimal(2).sqrt() Decimal('1.414213562373095048801688724') >>> Decimal(1).exp() Decimal('2.718281828459045235360287471') >>> Decimal('10').ln() Decimal('2.302585092994045684017991455') >>> Decimal('10').log10() Decimal('1') The "quantize()" method rounds a number to a fixed exponent. This method is useful for monetary applications that often round results to a fixed number of places: >>> Decimal('7.325').quantize(Decimal('.01'), rounding=ROUND_DOWN) Decimal('7.32') >>> Decimal('7.325').quantize(Decimal('1.'), rounding=ROUND_UP) Decimal('8') As shown above, the "getcontext()" function accesses the current context and allows the settings to be changed. This approach meets the needs of most applications. For more advanced work, it may be useful to create alternate contexts using the Context() constructor. To make an alternate active, use the "setcontext()" function. In accordance with the standard, the "decimal" module provides two ready to use standard contexts, "BasicContext" and "ExtendedContext". The former is especially useful for debugging because many of the traps are enabled: >>> myothercontext = Context(prec=60, rounding=ROUND_HALF_DOWN) >>> setcontext(myothercontext) >>> Decimal(1) / Decimal(7) Decimal('0.142857142857142857142857142857142857142857142857142857142857') >>> ExtendedContext Context(prec=9, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[]) >>> setcontext(ExtendedContext) >>> Decimal(1) / Decimal(7) Decimal('0.142857143') >>> Decimal(42) / Decimal(0) Decimal('Infinity') >>> setcontext(BasicContext) >>> Decimal(42) / Decimal(0) Traceback (most recent call last): File "", line 1, in -toplevel- Decimal(42) / Decimal(0) DivisionByZero: x / 0 Contexts also have signal flags for monitoring exceptional conditions encountered during computations. The flags remain set until explicitly cleared, so it is best to clear the flags before each set of monitored computations by using the "clear_flags()" method. >>> setcontext(ExtendedContext) >>> getcontext().clear_flags() >>> Decimal(355) / Decimal(113) Decimal('3.14159292') >>> getcontext() Context(prec=9, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[Inexact, Rounded], traps=[]) The *flags* entry shows that the rational approximation to pi was rounded (digits beyond the context precision were thrown away) and that the result is inexact (some of the discarded digits were non- zero). Individual traps are set using the dictionary in the "traps" attribute of a context: >>> setcontext(ExtendedContext) >>> Decimal(1) / Decimal(0) Decimal('Infinity') >>> getcontext().traps[DivisionByZero] = 1 >>> Decimal(1) / Decimal(0) Traceback (most recent call last): File "", line 1, in -toplevel- Decimal(1) / Decimal(0) DivisionByZero: x / 0 Most programs adjust the current context only once, at the beginning of the program. And, in many applications, data is converted to "Decimal" with a single cast inside a loop. With context set and decimals created, the bulk of the program manipulates the data no differently than with other Python numeric types. Decimal objects =============== class decimal.Decimal(value='0', context=None) Construct a new "Decimal" object based from *value*. *value* can be an integer, string, tuple, "float", or another "Decimal" object. If no *value* is given, returns "Decimal('0')". If *value* is a string, it should conform to the decimal numeric string syntax after leading and trailing whitespace characters, as well as underscores throughout, are removed: sign ::= '+' | '-' digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' indicator ::= 'e' | 'E' digits ::= digit [digit]... decimal-part ::= digits '.' [digits] | ['.'] digits exponent-part ::= indicator [sign] digits infinity ::= 'Infinity' | 'Inf' nan ::= 'NaN' [digits] | 'sNaN' [digits] numeric-value ::= decimal-part [exponent-part] | infinity numeric-string ::= [sign] numeric-value | [sign] nan Other Unicode decimal digits are also permitted where "digit" appears above. These include decimal digits from various other alphabets (for example, Arabic-Indic and Devanāgarī digits) along with the fullwidth digits "'\uff10'" through "'\uff19'". Case is not significant, so, for example, "inf", "Inf", "INFINITY", and "iNfINity" are all acceptable spellings for positive infinity. If *value* is a "tuple", it should have three components, a sign ("0" for positive or "1" for negative), a "tuple" of digits, and an integer exponent. For example, "Decimal((0, (1, 4, 1, 4), -3))" returns "Decimal('1.414')". If *value* is a "float", the binary floating-point value is losslessly converted to its exact decimal equivalent. This conversion can often require 53 or more digits of precision. For example, "Decimal(float('1.1'))" converts to "Decimal('1.100000000000000088817841970012523233890533447265625')". The *context* precision does not affect how many digits are stored. That is determined exclusively by the number of digits in *value*. For example, "Decimal('3.00000')" records all five zeros even if the context precision is only three. The purpose of the *context* argument is determining what to do if *value* is a malformed string. If the context traps "InvalidOperation", an exception is raised; otherwise, the constructor returns a new Decimal with the value of "NaN". Once constructed, "Decimal" objects are immutable. Changed in version 3.2: The argument to the constructor is now permitted to be a "float" instance. Changed in version 3.3: "float" arguments raise an exception if the "FloatOperation" trap is set. By default the trap is off. Changed in version 3.6: Underscores are allowed for grouping, as with integral and floating-point literals in code. Decimal floating-point objects share many properties with the other built-in numeric types such as "float" and "int". All of the usual math operations and special methods apply. Likewise, decimal objects can be copied, pickled, printed, used as dictionary keys, used as set elements, compared, sorted, and coerced to another type (such as "float" or "int"). There are some small differences between arithmetic on Decimal objects and arithmetic on integers and floats. When the remainder operator "%" is applied to Decimal objects, the sign of the result is the sign of the *dividend* rather than the sign of the divisor: >>> (-7) % 4 1 >>> Decimal(-7) % Decimal(4) Decimal('-3') The integer division operator "//" behaves analogously, returning the integer part of the true quotient (truncating towards zero) rather than its floor, so as to preserve the usual identity "x == (x // y) * y + x % y": >>> -7 // 4 -2 >>> Decimal(-7) // Decimal(4) Decimal('-1') The "%" and "//" operators implement the "remainder" and "divide- integer" operations (respectively) as described in the specification. Decimal objects cannot generally be combined with floats or instances of "fractions.Fraction" in arithmetic operations: an attempt to add a "Decimal" to a "float", for example, will raise a "TypeError". However, it is possible to use Python’s comparison operators to compare a "Decimal" instance "x" with another number "y". This avoids confusing results when doing equality comparisons between numbers of different types. Changed in version 3.2: Mixed-type comparisons between "Decimal" instances and other numeric types are now fully supported. In addition to the standard numeric properties, decimal floating- point objects also have a number of specialized methods: adjusted() Return the adjusted exponent after shifting out the coefficient’s rightmost digits until only the lead digit remains: "Decimal('321e+5').adjusted()" returns seven. Used for determining the position of the most significant digit with respect to the decimal point. as_integer_ratio() Return a pair "(n, d)" of integers that represent the given "Decimal" instance as a fraction, in lowest terms and with a positive denominator: >>> Decimal('-3.14').as_integer_ratio() (-157, 50) The conversion is exact. Raise OverflowError on infinities and ValueError on NaNs. Added in version 3.6. as_tuple() Return a *named tuple* representation of the number: "DecimalTuple(sign, digits, exponent)". canonical() Return the canonical encoding of the argument. Currently, the encoding of a "Decimal" instance is always canonical, so this operation returns its argument unchanged. compare(other, context=None) Compare the values of two Decimal instances. "compare()" returns a Decimal instance, and if either operand is a NaN then the result is a NaN: a or b is a NaN ==> Decimal('NaN') a < b ==> Decimal('-1') a == b ==> Decimal('0') a > b ==> Decimal('1') compare_signal(other, context=None) This operation is identical to the "compare()" method, except that all NaNs signal. That is, if neither operand is a signaling NaN then any quiet NaN operand is treated as though it were a signaling NaN. compare_total(other, context=None) Compare two operands using their abstract representation rather than their numerical value. Similar to the "compare()" method, but the result gives a total ordering on "Decimal" instances. Two "Decimal" instances with the same numeric value but different representations compare unequal in this ordering: >>> Decimal('12.0').compare_total(Decimal('12')) Decimal('-1') Quiet and signaling NaNs are also included in the total ordering. The result of this function is "Decimal('0')" if both operands have the same representation, "Decimal('-1')" if the first operand is lower in the total order than the second, and "Decimal('1')" if the first operand is higher in the total order than the second operand. See the specification for details of the total order. This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. compare_total_mag(other, context=None) Compare two operands using their abstract representation rather than their value as in "compare_total()", but ignoring the sign of each operand. "x.compare_total_mag(y)" is equivalent to "x.copy_abs().compare_total(y.copy_abs())". This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. conjugate() Just returns self, this method is only to comply with the Decimal Specification. copy_abs() Return the absolute value of the argument. This operation is unaffected by the context and is quiet: no flags are changed and no rounding is performed. copy_negate() Return the negation of the argument. This operation is unaffected by the context and is quiet: no flags are changed and no rounding is performed. copy_sign(other, context=None) Return a copy of the first operand with the sign set to be the same as the sign of the second operand. For example: >>> Decimal('2.3').copy_sign(Decimal('-1.5')) Decimal('-2.3') This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. exp(context=None) Return the value of the (natural) exponential function "e**x" at the given number. The result is correctly rounded using the "ROUND_HALF_EVEN" rounding mode. >>> Decimal(1).exp() Decimal('2.718281828459045235360287471') >>> Decimal(321).exp() Decimal('2.561702493119680037517373933E+139') classmethod from_float(f) Alternative constructor that only accepts instances of "float" or "int". Note "Decimal.from_float(0.1)" is not the same as "Decimal('0.1')". Since 0.1 is not exactly representable in binary floating point, the value is stored as the nearest representable value which is "0x1.999999999999ap-4". That equivalent value in decimal is "0.1000000000000000055511151231257827021181583404541015625". Note: From Python 3.2 onwards, a "Decimal" instance can also be constructed directly from a "float". >>> Decimal.from_float(0.1) Decimal('0.1000000000000000055511151231257827021181583404541015625') >>> Decimal.from_float(float('nan')) Decimal('NaN') >>> Decimal.from_float(float('inf')) Decimal('Infinity') >>> Decimal.from_float(float('-inf')) Decimal('-Infinity') Added in version 3.1. fma(other, third, context=None) Fused multiply-add. Return self*other+third with no rounding of the intermediate product self*other. >>> Decimal(2).fma(3, 5) Decimal('11') is_canonical() Return "True" if the argument is canonical and "False" otherwise. Currently, a "Decimal" instance is always canonical, so this operation always returns "True". is_finite() Return "True" if the argument is a finite number, and "False" if the argument is an infinity or a NaN. is_infinite() Return "True" if the argument is either positive or negative infinity and "False" otherwise. is_nan() Return "True" if the argument is a (quiet or signaling) NaN and "False" otherwise. is_normal(context=None) Return "True" if the argument is a *normal* finite number. Return "False" if the argument is zero, subnormal, infinite or a NaN. is_qnan() Return "True" if the argument is a quiet NaN, and "False" otherwise. is_signed() Return "True" if the argument has a negative sign and "False" otherwise. Note that zeros and NaNs can both carry signs. is_snan() Return "True" if the argument is a signaling NaN and "False" otherwise. is_subnormal(context=None) Return "True" if the argument is subnormal, and "False" otherwise. is_zero() Return "True" if the argument is a (positive or negative) zero and "False" otherwise. ln(context=None) Return the natural (base e) logarithm of the operand. The result is correctly rounded using the "ROUND_HALF_EVEN" rounding mode. log10(context=None) Return the base ten logarithm of the operand. The result is correctly rounded using the "ROUND_HALF_EVEN" rounding mode. logb(context=None) For a nonzero number, return the adjusted exponent of its operand as a "Decimal" instance. If the operand is a zero then "Decimal('-Infinity')" is returned and the "DivisionByZero" flag is raised. If the operand is an infinity then "Decimal('Infinity')" is returned. logical_and(other, context=None) "logical_and()" is a logical operation which takes two *logical operands* (see Logical operands). The result is the digit-wise "and" of the two operands. logical_invert(context=None) "logical_invert()" is a logical operation. The result is the digit-wise inversion of the operand. logical_or(other, context=None) "logical_or()" is a logical operation which takes two *logical operands* (see Logical operands). The result is the digit-wise "or" of the two operands. logical_xor(other, context=None) "logical_xor()" is a logical operation which takes two *logical operands* (see Logical operands). The result is the digit-wise exclusive or of the two operands. max(other, context=None) Like "max(self, other)" except that the context rounding rule is applied before returning and that "NaN" values are either signaled or ignored (depending on the context and whether they are signaling or quiet). max_mag(other, context=None) Similar to the "max()" method, but the comparison is done using the absolute values of the operands. min(other, context=None) Like "min(self, other)" except that the context rounding rule is applied before returning and that "NaN" values are either signaled or ignored (depending on the context and whether they are signaling or quiet). min_mag(other, context=None) Similar to the "min()" method, but the comparison is done using the absolute values of the operands. next_minus(context=None) Return the largest number representable in the given context (or in the current thread’s context if no context is given) that is smaller than the given operand. next_plus(context=None) Return the smallest number representable in the given context (or in the current thread’s context if no context is given) that is larger than the given operand. next_toward(other, context=None) If the two operands are unequal, return the number closest to the first operand in the direction of the second operand. If both operands are numerically equal, return a copy of the first operand with the sign set to be the same as the sign of the second operand. normalize(context=None) Used for producing canonical values of an equivalence class within either the current context or the specified context. This has the same semantics as the unary plus operation, except that if the final result is finite it is reduced to its simplest form, with all trailing zeros removed and its sign preserved. That is, while the coefficient is non-zero and a multiple of ten the coefficient is divided by ten and the exponent is incremented by 1. Otherwise (the coefficient is zero) the exponent is set to 0. In all cases the sign is unchanged. For example, "Decimal('32.100')" and "Decimal('0.321000e+2')" both normalize to the equivalent value "Decimal('32.1')". Note that rounding is applied *before* reducing to simplest form. In the latest versions of the specification, this operation is also known as "reduce". number_class(context=None) Return a string describing the *class* of the operand. The returned value is one of the following ten strings. * ""-Infinity"", indicating that the operand is negative infinity. * ""-Normal"", indicating that the operand is a negative normal number. * ""-Subnormal"", indicating that the operand is negative and subnormal. * ""-Zero"", indicating that the operand is a negative zero. * ""+Zero"", indicating that the operand is a positive zero. * ""+Subnormal"", indicating that the operand is positive and subnormal. * ""+Normal"", indicating that the operand is a positive normal number. * ""+Infinity"", indicating that the operand is positive infinity. * ""NaN"", indicating that the operand is a quiet NaN (Not a Number). * ""sNaN"", indicating that the operand is a signaling NaN. quantize(exp, rounding=None, context=None) Return a value equal to the first operand after rounding and having the exponent of the second operand. >>> Decimal('1.41421356').quantize(Decimal('1.000')) Decimal('1.414') Unlike other operations, if the length of the coefficient after the quantize operation would be greater than precision, then an "InvalidOperation" is signaled. This guarantees that, unless there is an error condition, the quantized exponent is always equal to that of the right-hand operand. Also unlike other operations, quantize never signals Underflow, even if the result is subnormal and inexact. If the exponent of the second operand is larger than that of the first then rounding may be necessary. In this case, the rounding mode is determined by the "rounding" argument if given, else by the given "context" argument; if neither argument is given the rounding mode of the current thread’s context is used. An error is returned whenever the resulting exponent is greater than "Emax" or less than "Etiny()". radix() Return "Decimal(10)", the radix (base) in which the "Decimal" class does all its arithmetic. Included for compatibility with the specification. remainder_near(other, context=None) Return the remainder from dividing *self* by *other*. This differs from "self % other" in that the sign of the remainder is chosen so as to minimize its absolute value. More precisely, the return value is "self - n * other" where "n" is the integer nearest to the exact value of "self / other", and if two integers are equally near then the even one is chosen. If the result is zero then its sign will be the sign of *self*. >>> Decimal(18).remainder_near(Decimal(10)) Decimal('-2') >>> Decimal(25).remainder_near(Decimal(10)) Decimal('5') >>> Decimal(35).remainder_near(Decimal(10)) Decimal('-5') rotate(other, context=None) Return the result of rotating the digits of the first operand by an amount specified by the second operand. The second operand must be an integer in the range -precision through precision. The absolute value of the second operand gives the number of places to rotate. If the second operand is positive then rotation is to the left; otherwise rotation is to the right. The coefficient of the first operand is padded on the left with zeros to length precision if necessary. The sign and exponent of the first operand are unchanged. same_quantum(other, context=None) Test whether self and other have the same exponent or whether both are "NaN". This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. scaleb(other, context=None) Return the first operand with exponent adjusted by the second. Equivalently, return the first operand multiplied by "10**other". The second operand must be an integer. shift(other, context=None) Return the result of shifting the digits of the first operand by an amount specified by the second operand. The second operand must be an integer in the range -precision through precision. The absolute value of the second operand gives the number of places to shift. If the second operand is positive then the shift is to the left; otherwise the shift is to the right. Digits shifted into the coefficient are zeros. The sign and exponent of the first operand are unchanged. sqrt(context=None) Return the square root of the argument to full precision. to_eng_string(context=None) Convert to a string, using engineering notation if an exponent is needed. Engineering notation has an exponent which is a multiple of 3. This can leave up to 3 digits to the left of the decimal place and may require the addition of either one or two trailing zeros. For example, this converts "Decimal('123E+1')" to "Decimal('1.23E+3')". to_integral(rounding=None, context=None) Identical to the "to_integral_value()" method. The "to_integral" name has been kept for compatibility with older versions. to_integral_exact(rounding=None, context=None) Round to the nearest integer, signaling "Inexact" or "Rounded" as appropriate if rounding occurs. The rounding mode is determined by the "rounding" parameter if given, else by the given "context". If neither parameter is given then the rounding mode of the current context is used. to_integral_value(rounding=None, context=None) Round to the nearest integer without signaling "Inexact" or "Rounded". If given, applies *rounding*; otherwise, uses the rounding method in either the supplied *context* or the current context. Decimal numbers can be rounded using the "round()" function: round(number) round(number, ndigits) If *ndigits* is not given or "None", returns the nearest "int" to *number*, rounding ties to even, and ignoring the rounding mode of the "Decimal" context. Raises "OverflowError" if *number* is an infinity or "ValueError" if it is a (quiet or signaling) NaN. If *ndigits* is an "int", the context’s rounding mode is respected and a "Decimal" representing *number* rounded to the nearest multiple of "Decimal('1E-ndigits')" is returned; in this case, "round(number, ndigits)" is equivalent to "self.quantize(Decimal('1E-ndigits'))". Returns "Decimal('NaN')" if *number* is a quiet NaN. Raises "InvalidOperation" if *number* is an infinity, a signaling NaN, or if the length of the coefficient after the quantize operation would be greater than the current context’s precision. In other words, for the non-corner cases: * if *ndigits* is positive, return *number* rounded to *ndigits* decimal places; * if *ndigits* is zero, return *number* rounded to the nearest integer; * if *ndigits* is negative, return *number* rounded to the nearest multiple of "10**abs(ndigits)". For example: >>> from decimal import Decimal, getcontext, ROUND_DOWN >>> getcontext().rounding = ROUND_DOWN >>> round(Decimal('3.75')) # context rounding ignored 4 >>> round(Decimal('3.5')) # round-ties-to-even 4 >>> round(Decimal('3.75'), 0) # uses the context rounding Decimal('3') >>> round(Decimal('3.75'), 1) Decimal('3.7') >>> round(Decimal('3.75'), -1) Decimal('0E+1') Logical operands ---------------- The "logical_and()", "logical_invert()", "logical_or()", and "logical_xor()" methods expect their arguments to be *logical operands*. A *logical operand* is a "Decimal" instance whose exponent and sign are both zero, and whose digits are all either "0" or "1". Context objects =============== Contexts are environments for arithmetic operations. They govern precision, set rules for rounding, determine which signals are treated as exceptions, and limit the range for exponents. Each thread has its own current context which is accessed or changed using the "getcontext()" and "setcontext()" functions: decimal.getcontext() Return the current context for the active thread. decimal.setcontext(c) Set the current context for the active thread to *c*. You can also use the "with" statement and the "localcontext()" function to temporarily change the active context. decimal.localcontext(ctx=None, **kwargs) Return a context manager that will set the current context for the active thread to a copy of *ctx* on entry to the with-statement and restore the previous context when exiting the with-statement. If no context is specified, a copy of the current context is used. The *kwargs* argument is used to set the attributes of the new context. For example, the following code sets the current decimal precision to 42 places, performs a calculation, and then automatically restores the previous context: from decimal import localcontext with localcontext() as ctx: ctx.prec = 42 # Perform a high precision calculation s = calculate_something() s = +s # Round the final result back to the default precision Using keyword arguments, the code would be the following: from decimal import localcontext with localcontext(prec=42) as ctx: s = calculate_something() s = +s Raises "TypeError" if *kwargs* supplies an attribute that "Context" doesn’t support. Raises either "TypeError" or "ValueError" if *kwargs* supplies an invalid value for an attribute. Changed in version 3.11: "localcontext()" now supports setting context attributes through the use of keyword arguments. New contexts can also be created using the "Context" constructor described below. In addition, the module provides three pre-made contexts: decimal.BasicContext This is a standard context defined by the General Decimal Arithmetic Specification. Precision is set to nine. Rounding is set to "ROUND_HALF_UP". All flags are cleared. All traps are enabled (treated as exceptions) except "Inexact", "Rounded", and "Subnormal". Because many of the traps are enabled, this context is useful for debugging. decimal.ExtendedContext This is a standard context defined by the General Decimal Arithmetic Specification. Precision is set to nine. Rounding is set to "ROUND_HALF_EVEN". All flags are cleared. No traps are enabled (so that exceptions are not raised during computations). Because the traps are disabled, this context is useful for applications that prefer to have result value of "NaN" or "Infinity" instead of raising exceptions. This allows an application to complete a run in the presence of conditions that would otherwise halt the program. decimal.DefaultContext This context is used by the "Context" constructor as a prototype for new contexts. Changing a field (such a precision) has the effect of changing the default for new contexts created by the "Context" constructor. This context is most useful in multi-threaded environments. Changing one of the fields before threads are started has the effect of setting system-wide defaults. Changing the fields after threads have started is not recommended as it would require thread synchronization to prevent race conditions. In single threaded environments, it is preferable to not use this context at all. Instead, simply create contexts explicitly as described below. The default values are "Context.prec"="28", "Context.rounding"="ROUND_HALF_EVEN", and enabled traps for "Overflow", "InvalidOperation", and "DivisionByZero". In addition to the three supplied contexts, new contexts can be created with the "Context" constructor. class decimal.Context(prec=None, rounding=None, Emin=None, Emax=None, capitals=None, clamp=None, flags=None, traps=None) Creates a new context. If a field is not specified or is "None", the default values are copied from the "DefaultContext". If the *flags* field is not specified or is "None", all flags are cleared. prec An integer in the range ["1", "MAX_PREC"] that sets the precision for arithmetic operations in the context. rounding One of the constants listed in the section Rounding Modes. traps flags Lists of any signals to be set. Generally, new contexts should only set traps and leave the flags clear. Emin Emax Integers specifying the outer limits allowable for exponents. *Emin* must be in the range ["MIN_EMIN", "0"], *Emax* in the range ["0", "MAX_EMAX"]. capitals Either "0" or "1" (the default). If set to "1", exponents are printed with a capital "E"; otherwise, a lowercase "e" is used: "Decimal('6.02e+23')". clamp Either "0" (the default) or "1". If set to "1", the exponent "e" of a "Decimal" instance representable in this context is strictly limited to the range "Emin - prec + 1 <= e <= Emax - prec + 1". If *clamp* is "0" then a weaker condition holds: the adjusted exponent of the "Decimal" instance is at most "Emax". When *clamp* is "1", a large normal number will, where possible, have its exponent reduced and a corresponding number of zeros added to its coefficient, in order to fit the exponent constraints; this preserves the value of the number but loses information about significant trailing zeros. For example: >>> Context(prec=6, Emax=999, clamp=1).create_decimal('1.23e999') Decimal('1.23000E+999') A *clamp* value of "1" allows compatibility with the fixed-width decimal interchange formats specified in IEEE 754. The "Context" class defines several general purpose methods as well as a large number of methods for doing arithmetic directly in a given context. In addition, for each of the "Decimal" methods described above (with the exception of the "adjusted()" and "as_tuple()" methods) there is a corresponding "Context" method. For example, for a "Context" instance "C" and "Decimal" instance "x", "C.exp(x)" is equivalent to "x.exp(context=C)". Each "Context" method accepts a Python integer (an instance of "int") anywhere that a Decimal instance is accepted. clear_flags() Resets all of the flags to "0". clear_traps() Resets all of the traps to "0". Added in version 3.3. copy() Return a duplicate of the context. copy_decimal(num) Return a copy of the Decimal instance num. create_decimal(num) Creates a new Decimal instance from *num* but using *self* as context. Unlike the "Decimal" constructor, the context precision, rounding method, flags, and traps are applied to the conversion. This is useful because constants are often given to a greater precision than is needed by the application. Another benefit is that rounding immediately eliminates unintended effects from digits beyond the current precision. In the following example, using unrounded inputs means that adding zero to a sum can change the result: >>> getcontext().prec = 3 >>> Decimal('3.4445') + Decimal('1.0023') Decimal('4.45') >>> Decimal('3.4445') + Decimal(0) + Decimal('1.0023') Decimal('4.44') This method implements the to-number operation of the IBM specification. If the argument is a string, no leading or trailing whitespace or underscores are permitted. create_decimal_from_float(f) Creates a new Decimal instance from a float *f* but rounding using *self* as the context. Unlike the "Decimal.from_float()" class method, the context precision, rounding method, flags, and traps are applied to the conversion. >>> context = Context(prec=5, rounding=ROUND_DOWN) >>> context.create_decimal_from_float(math.pi) Decimal('3.1415') >>> context = Context(prec=5, traps=[Inexact]) >>> context.create_decimal_from_float(math.pi) Traceback (most recent call last): ... decimal.Inexact: None Added in version 3.1. Etiny() Returns a value equal to "Emin - prec + 1" which is the minimum exponent value for subnormal results. When underflow occurs, the exponent is set to "Etiny". Etop() Returns a value equal to "Emax - prec + 1". The usual approach to working with decimals is to create "Decimal" instances and then apply arithmetic operations which take place within the current context for the active thread. An alternative approach is to use context methods for calculating within a specific context. The methods are similar to those for the "Decimal" class and are only briefly recounted here. abs(x) Returns the absolute value of *x*. add(x, y) Return the sum of *x* and *y*. canonical(x) Returns the same Decimal object *x*. compare(x, y) Compares *x* and *y* numerically. compare_signal(x, y) Compares the values of the two operands numerically. compare_total(x, y) Compares two operands using their abstract representation. compare_total_mag(x, y) Compares two operands using their abstract representation, ignoring sign. copy_abs(x) Returns a copy of *x* with the sign set to 0. copy_negate(x) Returns a copy of *x* with the sign inverted. copy_sign(x, y) Copies the sign from *y* to *x*. divide(x, y) Return *x* divided by *y*. divide_int(x, y) Return *x* divided by *y*, truncated to an integer. divmod(x, y) Divides two numbers and returns the integer part of the result. exp(x) Returns "e ** x". fma(x, y, z) Returns *x* multiplied by *y*, plus *z*. is_canonical(x) Returns "True" if *x* is canonical; otherwise returns "False". is_finite(x) Returns "True" if *x* is finite; otherwise returns "False". is_infinite(x) Returns "True" if *x* is infinite; otherwise returns "False". is_nan(x) Returns "True" if *x* is a qNaN or sNaN; otherwise returns "False". is_normal(x) Returns "True" if *x* is a normal number; otherwise returns "False". is_qnan(x) Returns "True" if *x* is a quiet NaN; otherwise returns "False". is_signed(x) Returns "True" if *x* is negative; otherwise returns "False". is_snan(x) Returns "True" if *x* is a signaling NaN; otherwise returns "False". is_subnormal(x) Returns "True" if *x* is subnormal; otherwise returns "False". is_zero(x) Returns "True" if *x* is a zero; otherwise returns "False". ln(x) Returns the natural (base e) logarithm of *x*. log10(x) Returns the base 10 logarithm of *x*. logb(x) Returns the exponent of the magnitude of the operand’s MSD. logical_and(x, y) Applies the logical operation *and* between each operand’s digits. logical_invert(x) Invert all the digits in *x*. logical_or(x, y) Applies the logical operation *or* between each operand’s digits. logical_xor(x, y) Applies the logical operation *xor* between each operand’s digits. max(x, y) Compares two values numerically and returns the maximum. max_mag(x, y) Compares the values numerically with their sign ignored. min(x, y) Compares two values numerically and returns the minimum. min_mag(x, y) Compares the values numerically with their sign ignored. minus(x) Minus corresponds to the unary prefix minus operator in Python. multiply(x, y) Return the product of *x* and *y*. next_minus(x) Returns the largest representable number smaller than *x*. next_plus(x) Returns the smallest representable number larger than *x*. next_toward(x, y) Returns the number closest to *x*, in direction towards *y*. normalize(x) Reduces *x* to its simplest form. number_class(x) Returns an indication of the class of *x*. plus(x) Plus corresponds to the unary prefix plus operator in Python. This operation applies the context precision and rounding, so it is *not* an identity operation. power(x, y, modulo=None) Return "x" to the power of "y", reduced modulo "modulo" if given. With two arguments, compute "x**y". If "x" is negative then "y" must be integral. The result will be inexact unless "y" is integral and the result is finite and can be expressed exactly in ‘precision’ digits. The rounding mode of the context is used. Results are always correctly rounded in the Python version. "Decimal(0) ** Decimal(0)" results in "InvalidOperation", and if "InvalidOperation" is not trapped, then results in "Decimal('NaN')". Changed in version 3.3: The C module computes "power()" in terms of the correctly rounded "exp()" and "ln()" functions. The result is well-defined but only “almost always correctly rounded”. With three arguments, compute "(x**y) % modulo". For the three argument form, the following restrictions on the arguments hold: * all three arguments must be integral * "y" must be nonnegative * at least one of "x" or "y" must be nonzero * "modulo" must be nonzero and have at most ‘precision’ digits The value resulting from "Context.power(x, y, modulo)" is equal to the value that would be obtained by computing "(x**y) % modulo" with unbounded precision, but is computed more efficiently. The exponent of the result is zero, regardless of the exponents of "x", "y" and "modulo". The result is always exact. quantize(x, y) Returns a value equal to *x* (rounded), having the exponent of *y*. radix() Just returns 10, as this is Decimal, :) remainder(x, y) Returns the remainder from integer division. The sign of the result, if non-zero, is the same as that of the original dividend. remainder_near(x, y) Returns "x - y * n", where *n* is the integer nearest the exact value of "x / y" (if the result is 0 then its sign will be the sign of *x*). rotate(x, y) Returns a rotated copy of *x*, *y* times. same_quantum(x, y) Returns "True" if the two operands have the same exponent. scaleb(x, y) Returns the first operand after adding the second value its exp. shift(x, y) Returns a shifted copy of *x*, *y* times. sqrt(x) Square root of a non-negative number to context precision. subtract(x, y) Return the difference between *x* and *y*. to_eng_string(x) Convert to a string, using engineering notation if an exponent is needed. Engineering notation has an exponent which is a multiple of 3. This can leave up to 3 digits to the left of the decimal place and may require the addition of either one or two trailing zeros. to_integral_exact(x) Rounds to an integer. to_sci_string(x) Converts a number to a string using scientific notation. Constants ========= The constants in this section are only relevant for the C module. They are also included in the pure Python version for compatibility. +-----------------------+-----------------------+---------------------------------+ | | 32-bit | 64-bit | |=======================|=======================|=================================| | decimal.MAX_PREC | "425000000" | "999999999999999999" | +-----------------------+-----------------------+---------------------------------+ | decimal.MAX_EMAX | "425000000" | "999999999999999999" | +-----------------------+-----------------------+---------------------------------+ | decimal.MIN_EMIN | "-425000000" | "-999999999999999999" | +-----------------------+-----------------------+---------------------------------+ | decimal.MIN_ETINY | "-849999999" | "-1999999999999999997" | +-----------------------+-----------------------+---------------------------------+ decimal.HAVE_THREADS The value is "True". Deprecated, because Python now always has threads. Deprecated since version 3.9. decimal.HAVE_CONTEXTVAR The default value is "True". If Python is "configured using the --without-decimal-contextvar option", the C version uses a thread- local rather than a coroutine-local context and the value is "False". This is slightly faster in some nested context scenarios. Added in version 3.8.3. Rounding modes ============== decimal.ROUND_CEILING Round towards "Infinity". decimal.ROUND_DOWN Round towards zero. decimal.ROUND_FLOOR Round towards "-Infinity". decimal.ROUND_HALF_DOWN Round to nearest with ties going towards zero. decimal.ROUND_HALF_EVEN Round to nearest with ties going to nearest even integer. decimal.ROUND_HALF_UP Round to nearest with ties going away from zero. decimal.ROUND_UP Round away from zero. decimal.ROUND_05UP Round away from zero if last digit after rounding towards zero would have been 0 or 5; otherwise round towards zero. Signals ======= Signals represent conditions that arise during computation. Each corresponds to one context flag and one context trap enabler. The context flag is set whenever the condition is encountered. After the computation, flags may be checked for informational purposes (for instance, to determine whether a computation was exact). After checking the flags, be sure to clear all flags before starting the next computation. If the context’s trap enabler is set for the signal, then the condition causes a Python exception to be raised. For example, if the "DivisionByZero" trap is set, then a "DivisionByZero" exception is raised upon encountering the condition. class decimal.Clamped Altered an exponent to fit representation constraints. Typically, clamping occurs when an exponent falls outside the context’s "Emin" and "Emax" limits. If possible, the exponent is reduced to fit by adding zeros to the coefficient. class decimal.DecimalException Base class for other signals and a subclass of "ArithmeticError". class decimal.DivisionByZero Signals the division of a non-infinite number by zero. Can occur with division, modulo division, or when raising a number to a negative power. If this signal is not trapped, returns "Infinity" or "-Infinity" with the sign determined by the inputs to the calculation. class decimal.Inexact Indicates that rounding occurred and the result is not exact. Signals when non-zero digits were discarded during rounding. The rounded result is returned. The signal flag or trap is used to detect when results are inexact. class decimal.InvalidOperation An invalid operation was performed. Indicates that an operation was requested that does not make sense. If not trapped, returns "NaN". Possible causes include: Infinity - Infinity 0 * Infinity Infinity / Infinity x % 0 Infinity % x sqrt(-x) and x > 0 0 ** 0 x ** (non-integer) x ** Infinity class decimal.Overflow Numerical overflow. Indicates the exponent is larger than "Context.Emax" after rounding has occurred. If not trapped, the result depends on the rounding mode, either pulling inward to the largest representable finite number or rounding outward to "Infinity". In either case, "Inexact" and "Rounded" are also signaled. class decimal.Rounded Rounding occurred though possibly no information was lost. Signaled whenever rounding discards digits; even if those digits are zero (such as rounding "5.00" to "5.0"). If not trapped, returns the result unchanged. This signal is used to detect loss of significant digits. class decimal.Subnormal Exponent was lower than "Emin" prior to rounding. Occurs when an operation result is subnormal (the exponent is too small). If not trapped, returns the result unchanged. class decimal.Underflow Numerical underflow with result rounded to zero. Occurs when a subnormal result is pushed to zero by rounding. "Inexact" and "Subnormal" are also signaled. class decimal.FloatOperation Enable stricter semantics for mixing floats and Decimals. If the signal is not trapped (default), mixing floats and Decimals is permitted in the "Decimal" constructor, "create_decimal()" and all comparison operators. Both conversion and comparisons are exact. Any occurrence of a mixed operation is silently recorded by setting "FloatOperation" in the context flags. Explicit conversions with "from_float()" or "create_decimal_from_float()" do not set the flag. Otherwise (the signal is trapped), only equality comparisons and explicit conversions are silent. All other mixed operations raise "FloatOperation". The following table summarizes the hierarchy of signals: exceptions.ArithmeticError(exceptions.Exception) DecimalException Clamped DivisionByZero(DecimalException, exceptions.ZeroDivisionError) Inexact Overflow(Inexact, Rounded) Underflow(Inexact, Rounded, Subnormal) InvalidOperation Rounded Subnormal FloatOperation(DecimalException, exceptions.TypeError) Floating-point notes ==================== Mitigating round-off error with increased precision --------------------------------------------------- The use of decimal floating point eliminates decimal representation error (making it possible to represent "0.1" exactly); however, some operations can still incur round-off error when non-zero digits exceed the fixed precision. The effects of round-off error can be amplified by the addition or subtraction of nearly offsetting quantities resulting in loss of significance. Knuth provides two instructive examples where rounded floating-point arithmetic with insufficient precision causes the breakdown of the associative and distributive properties of addition: # Examples from Seminumerical Algorithms, Section 4.2.2. >>> from decimal import Decimal, getcontext >>> getcontext().prec = 8 >>> u, v, w = Decimal(11111113), Decimal(-11111111), Decimal('7.51111111') >>> (u + v) + w Decimal('9.5111111') >>> u + (v + w) Decimal('10') >>> u, v, w = Decimal(20000), Decimal(-6), Decimal('6.0000003') >>> (u*v) + (u*w) Decimal('0.01') >>> u * (v+w) Decimal('0.0060000') The "decimal" module makes it possible to restore the identities by expanding the precision sufficiently to avoid loss of significance: >>> getcontext().prec = 20 >>> u, v, w = Decimal(11111113), Decimal(-11111111), Decimal('7.51111111') >>> (u + v) + w Decimal('9.51111111') >>> u + (v + w) Decimal('9.51111111') >>> >>> u, v, w = Decimal(20000), Decimal(-6), Decimal('6.0000003') >>> (u*v) + (u*w) Decimal('0.0060000') >>> u * (v+w) Decimal('0.0060000') Special values -------------- The number system for the "decimal" module provides special values including "NaN", "sNaN", "-Infinity", "Infinity", and two zeros, "+0" and "-0". Infinities can be constructed directly with: "Decimal('Infinity')". Also, they can arise from dividing by zero when the "DivisionByZero" signal is not trapped. Likewise, when the "Overflow" signal is not trapped, infinity can result from rounding beyond the limits of the largest representable number. The infinities are signed (affine) and can be used in arithmetic operations where they get treated as very large, indeterminate numbers. For instance, adding a constant to infinity gives another infinite result. Some operations are indeterminate and return "NaN", or if the "InvalidOperation" signal is trapped, raise an exception. For example, "0/0" returns "NaN" which means “not a number”. This variety of "NaN" is quiet and, once created, will flow through other computations always resulting in another "NaN". This behavior can be useful for a series of computations that occasionally have missing inputs — it allows the calculation to proceed while flagging specific results as invalid. A variant is "sNaN" which signals rather than remaining quiet after every operation. This is a useful return value when an invalid result needs to interrupt a calculation for special handling. The behavior of Python’s comparison operators can be a little surprising where a "NaN" is involved. A test for equality where one of the operands is a quiet or signaling "NaN" always returns "False" (even when doing "Decimal('NaN')==Decimal('NaN')"), while a test for inequality always returns "True". An attempt to compare two Decimals using any of the "<", "<=", ">" or ">=" operators will raise the "InvalidOperation" signal if either operand is a "NaN", and return "False" if this signal is not trapped. Note that the General Decimal Arithmetic specification does not specify the behavior of direct comparisons; these rules for comparisons involving a "NaN" were taken from the IEEE 854 standard (see Table 3 in section 5.7). To ensure strict standards-compliance, use the "compare()" and "compare_signal()" methods instead. The signed zeros can result from calculations that underflow. They keep the sign that would have resulted if the calculation had been carried out to greater precision. Since their magnitude is zero, both positive and negative zeros are treated as equal and their sign is informational. In addition to the two signed zeros which are distinct yet equal, there are various representations of zero with differing precisions yet equivalent in value. This takes a bit of getting used to. For an eye accustomed to normalized floating-point representations, it is not immediately obvious that the following calculation returns a value equal to zero: >>> 1 / Decimal('Infinity') Decimal('0E-1000026') Working with threads ==================== The "getcontext()" function accesses a different "Context" object for each thread. Having separate thread contexts means that threads may make changes (such as "getcontext().prec=10") without interfering with other threads. Likewise, the "setcontext()" function automatically assigns its target to the current thread. If "setcontext()" has not been called before "getcontext()", then "getcontext()" will automatically create a new context for use in the current thread. The new context is copied from a prototype context called *DefaultContext*. To control the defaults so that each thread will use the same values throughout the application, directly modify the *DefaultContext* object. This should be done *before* any threads are started so that there won’t be a race condition between threads calling "getcontext()". For example: # Set applicationwide defaults for all threads about to be launched DefaultContext.prec = 12 DefaultContext.rounding = ROUND_DOWN DefaultContext.traps = ExtendedContext.traps.copy() DefaultContext.traps[InvalidOperation] = 1 setcontext(DefaultContext) # Afterwards, the threads can be started t1.start() t2.start() t3.start() . . . Recipes ======= Here are a few recipes that serve as utility functions and that demonstrate ways to work with the "Decimal" class: def moneyfmt(value, places=2, curr='', sep=',', dp='.', pos='', neg='-', trailneg=''): """Convert Decimal to a money formatted string. places: required number of places after the decimal point curr: optional currency symbol before the sign (may be blank) sep: optional grouping separator (comma, period, space, or blank) dp: decimal point indicator (comma or period) only specify as blank when places is zero pos: optional sign for positive numbers: '+', space or blank neg: optional sign for negative numbers: '-', '(', space or blank trailneg:optional trailing minus indicator: '-', ')', space or blank >>> d = Decimal('-1234567.8901') >>> moneyfmt(d, curr='$') '-$1,234,567.89' >>> moneyfmt(d, places=0, sep='.', dp='', neg='', trailneg='-') '1.234.568-' >>> moneyfmt(d, curr='$', neg='(', trailneg=')') '($1,234,567.89)' >>> moneyfmt(Decimal(123456789), sep=' ') '123 456 789.00' >>> moneyfmt(Decimal('-0.02'), neg='<', trailneg='>') '<0.02>' """ q = Decimal(10) ** -places # 2 places --> '0.01' sign, digits, exp = value.quantize(q).as_tuple() result = [] digits = list(map(str, digits)) build, next = result.append, digits.pop if sign: build(trailneg) for i in range(places): build(next() if digits else '0') if places: build(dp) if not digits: build('0') i = 0 while digits: build(next()) i += 1 if i == 3 and digits: i = 0 build(sep) build(curr) build(neg if sign else pos) return ''.join(reversed(result)) def pi(): """Compute Pi to the current precision. >>> print(pi()) 3.141592653589793238462643383 """ getcontext().prec += 2 # extra digits for intermediate steps three = Decimal(3) # substitute "three=3.0" for regular floats lasts, t, s, n, na, d, da = 0, three, 3, 1, 0, 0, 24 while s != lasts: lasts = s n, na = n+na, na+8 d, da = d+da, da+32 t = (t * n) / d s += t getcontext().prec -= 2 return +s # unary plus applies the new precision def exp(x): """Return e raised to the power of x. Result type matches input type. >>> print(exp(Decimal(1))) 2.718281828459045235360287471 >>> print(exp(Decimal(2))) 7.389056098930650227230427461 >>> print(exp(2.0)) 7.38905609893 >>> print(exp(2+0j)) (7.38905609893+0j) """ getcontext().prec += 2 i, lasts, s, fact, num = 0, 0, 1, 1, 1 while s != lasts: lasts = s i += 1 fact *= i num *= x s += num / fact getcontext().prec -= 2 return +s def cos(x): """Return the cosine of x as measured in radians. The Taylor series approximation works best for a small value of x. For larger values, first compute x = x % (2 * pi). >>> print(cos(Decimal('0.5'))) 0.8775825618903727161162815826 >>> print(cos(0.5)) 0.87758256189 >>> print(cos(0.5+0j)) (0.87758256189+0j) """ getcontext().prec += 2 i, lasts, s, fact, num, sign = 0, 0, 1, 1, 1, 1 while s != lasts: lasts = s i += 2 fact *= i * (i-1) num *= x * x sign *= -1 s += num / fact * sign getcontext().prec -= 2 return +s def sin(x): """Return the sine of x as measured in radians. The Taylor series approximation works best for a small value of x. For larger values, first compute x = x % (2 * pi). >>> print(sin(Decimal('0.5'))) 0.4794255386042030002732879352 >>> print(sin(0.5)) 0.479425538604 >>> print(sin(0.5+0j)) (0.479425538604+0j) """ getcontext().prec += 2 i, lasts, s, fact, num, sign = 1, 0, x, 1, x, 1 while s != lasts: lasts = s i += 2 fact *= i * (i-1) num *= x * x sign *= -1 s += num / fact * sign getcontext().prec -= 2 return +s Decimal FAQ =========== Q. It is cumbersome to type "decimal.Decimal('1234.5')". Is there a way to minimize typing when using the interactive interpreter? A. Some users abbreviate the constructor to just a single letter: >>> D = decimal.Decimal >>> D('1.23') + D('3.45') Decimal('4.68') Q. In a fixed-point application with two decimal places, some inputs have many places and need to be rounded. Others are not supposed to have excess digits and need to be validated. What methods should be used? A. The "quantize()" method rounds to a fixed number of decimal places. If the "Inexact" trap is set, it is also useful for validation: >>> TWOPLACES = Decimal(10) ** -2 # same as Decimal('0.01') >>> # Round to two places >>> Decimal('3.214').quantize(TWOPLACES) Decimal('3.21') >>> # Validate that a number does not exceed two places >>> Decimal('3.21').quantize(TWOPLACES, context=Context(traps=[Inexact])) Decimal('3.21') >>> Decimal('3.214').quantize(TWOPLACES, context=Context(traps=[Inexact])) Traceback (most recent call last): ... Inexact: None Q. Once I have valid two place inputs, how do I maintain that invariant throughout an application? A. Some operations like addition, subtraction, and multiplication by an integer will automatically preserve fixed point. Others operations, like division and non-integer multiplication, will change the number of decimal places and need to be followed-up with a "quantize()" step: >>> a = Decimal('102.72') # Initial fixed-point values >>> b = Decimal('3.17') >>> a + b # Addition preserves fixed-point Decimal('105.89') >>> a - b Decimal('99.55') >>> a * 42 # So does integer multiplication Decimal('4314.24') >>> (a * b).quantize(TWOPLACES) # Must quantize non-integer multiplication Decimal('325.62') >>> (b / a).quantize(TWOPLACES) # And quantize division Decimal('0.03') In developing fixed-point applications, it is convenient to define functions to handle the "quantize()" step: >>> def mul(x, y, fp=TWOPLACES): ... return (x * y).quantize(fp) ... >>> def div(x, y, fp=TWOPLACES): ... return (x / y).quantize(fp) >>> mul(a, b) # Automatically preserve fixed-point Decimal('325.62') >>> div(b, a) Decimal('0.03') Q. There are many ways to express the same value. The numbers "200", "200.000", "2E2", and ".02E+4" all have the same value at various precisions. Is there a way to transform them to a single recognizable canonical value? A. The "normalize()" method maps all equivalent values to a single representative: >>> values = map(Decimal, '200 200.000 2E2 .02E+4'.split()) >>> [v.normalize() for v in values] [Decimal('2E+2'), Decimal('2E+2'), Decimal('2E+2'), Decimal('2E+2')] Q. When does rounding occur in a computation? A. It occurs *after* the computation. The philosophy of the decimal specification is that numbers are considered exact and are created independent of the current context. They can even have greater precision than current context. Computations process with those exact inputs and then rounding (or other context operations) is applied to the *result* of the computation: >>> getcontext().prec = 5 >>> pi = Decimal('3.1415926535') # More than 5 digits >>> pi # All digits are retained Decimal('3.1415926535') >>> pi + 0 # Rounded after an addition Decimal('3.1416') >>> pi - Decimal('0.00005') # Subtract unrounded numbers, then round Decimal('3.1415') >>> pi + 0 - Decimal('0.00005'). # Intermediate values are rounded Decimal('3.1416') Q. Some decimal values always print with exponential notation. Is there a way to get a non-exponential representation? A. For some values, exponential notation is the only way to express the number of significant places in the coefficient. For example, expressing "5.0E+3" as "5000" keeps the value constant but cannot show the original’s two-place significance. If an application does not care about tracking significance, it is easy to remove the exponent and trailing zeroes, losing significance, but keeping the value unchanged: >>> def remove_exponent(d): ... return d.quantize(Decimal(1)) if d == d.to_integral() else d.normalize() >>> remove_exponent(Decimal('5E+3')) Decimal('5000') Q. Is there a way to convert a regular float to a "Decimal"? A. Yes, any binary floating-point number can be exactly expressed as a Decimal though an exact conversion may take more precision than intuition would suggest: >>> Decimal(math.pi) Decimal('3.141592653589793115997963468544185161590576171875') Q. Within a complex calculation, how can I make sure that I haven’t gotten a spurious result because of insufficient precision or rounding anomalies. A. The decimal module makes it easy to test results. A best practice is to re-run calculations using greater precision and with various rounding modes. Widely differing results indicate insufficient precision, rounding mode issues, ill-conditioned inputs, or a numerically unstable algorithm. Q. I noticed that context precision is applied to the results of operations but not to the inputs. Is there anything to watch out for when mixing values of different precisions? A. Yes. The principle is that all values are considered to be exact and so is the arithmetic on those values. Only the results are rounded. The advantage for inputs is that “what you type is what you get”. A disadvantage is that the results can look odd if you forget that the inputs haven’t been rounded: >>> getcontext().prec = 3 >>> Decimal('3.104') + Decimal('2.104') Decimal('5.21') >>> Decimal('3.104') + Decimal('0.000') + Decimal('2.104') Decimal('5.20') The solution is either to increase precision or to force rounding of inputs using the unary plus operation: >>> getcontext().prec = 3 >>> +Decimal('1.23456789') # unary plus triggers rounding Decimal('1.23') Alternatively, inputs can be rounded upon creation using the "Context.create_decimal()" method: >>> Context(prec=5, rounding=ROUND_DOWN).create_decimal('1.2345678') Decimal('1.2345') Q. Is the CPython implementation fast for large numbers? A. Yes. In the CPython and PyPy3 implementations, the C/CFFI versions of the decimal module integrate the high speed libmpdec library for arbitrary precision correctly rounded decimal floating-point arithmetic [1]. "libmpdec" uses Karatsuba multiplication for medium- sized numbers and the Number Theoretic Transform for very large numbers. The context must be adapted for exact arbitrary precision arithmetic. "Emin" and "Emax" should always be set to the maximum values, "clamp" should always be 0 (the default). Setting "prec" requires some care. The easiest approach for trying out bignum arithmetic is to use the maximum value for "prec" as well [2]: >>> setcontext(Context(prec=MAX_PREC, Emax=MAX_EMAX, Emin=MIN_EMIN)) >>> x = Decimal(2) ** 256 >>> x / 128 Decimal('904625697166532776746648320380374280103671755200316906558262375061821325312') For inexact results, "MAX_PREC" is far too large on 64-bit platforms and the available memory will be insufficient: >>> Decimal(1) / 3 Traceback (most recent call last): File "", line 1, in MemoryError On systems with overallocation (e.g. Linux), a more sophisticated approach is to adjust "prec" to the amount of available RAM. Suppose that you have 8GB of RAM and expect 10 simultaneous operands using a maximum of 500MB each: >>> import sys >>> >>> # Maximum number of digits for a single operand using 500MB in 8-byte words >>> # with 19 digits per word (4-byte and 9 digits for the 32-bit build): >>> maxdigits = 19 * ((500 * 1024**2) // 8) >>> >>> # Check that this works: >>> c = Context(prec=maxdigits, Emax=MAX_EMAX, Emin=MIN_EMIN) >>> c.traps[Inexact] = True >>> setcontext(c) >>> >>> # Fill the available precision with nines: >>> x = Decimal(0).logical_invert() * 9 >>> sys.getsizeof(x) 524288112 >>> x + 2 Traceback (most recent call last): File "", line 1, in decimal.Inexact: [] In general (and especially on systems without overallocation), it is recommended to estimate even tighter bounds and set the "Inexact" trap if all calculations are expected to be exact. [1] Added in version 3.3. [2] Changed in version 3.9: This approach now works for all exact results except for non-integer powers. Development Tools ***************** The modules described in this chapter help you write software. For example, the "pydoc" module takes a module and generates documentation based on the module’s contents. The "doctest" and "unittest" modules contains frameworks for writing unit tests that automatically exercise code and verify that the expected output is produced. The list of modules described in this chapter is: * "typing" — Support for type hints * Specification for the Python Type System * Type aliases * NewType * Annotating callable objects * Generics * Annotating tuples * The type of class objects * Annotating generators and coroutines * User-defined generic types * The "Any" type * Nominal vs structural subtyping * Module contents * Special typing primitives * Special types * Special forms * Building generic types and type aliases * Other special directives * Protocols * ABCs for working with IO * Functions and decorators * Introspection helpers * Constant * Deprecated aliases * Aliases to built-in types * Aliases to types in "collections" * Aliases to other concrete types * Aliases to container ABCs in "collections.abc" * Aliases to asynchronous ABCs in "collections.abc" * Aliases to other ABCs in "collections.abc" * Aliases to "contextlib" ABCs * Deprecation Timeline of Major Features * "pydoc" — Documentation generator and online help system * Python Development Mode * Effects of the Python Development Mode * ResourceWarning Example * Bad file descriptor error example * "doctest" — Test interactive Python examples * Simple Usage: Checking Examples in Docstrings * Simple Usage: Checking Examples in a Text File * Command-line Usage * How It Works * Which Docstrings Are Examined? * How are Docstring Examples Recognized? * What’s the Execution Context? * What About Exceptions? * Option Flags * Directives * Warnings * Basic API * Unittest API * Advanced API * DocTest Objects * Example Objects * DocTestFinder objects * DocTestParser objects * TestResults objects * DocTestRunner objects * OutputChecker objects * Debugging * Soapbox * "unittest" — Unit testing framework * Basic example * Command-Line Interface * Command-line options * Test Discovery * Organizing test code * Re-using old test code * Skipping tests and expected failures * Distinguishing test iterations using subtests * Classes and functions * Test cases * Grouping tests * Loading and running tests * load_tests Protocol * Class and Module Fixtures * setUpClass and tearDownClass * setUpModule and tearDownModule * Signal Handling * "unittest.mock" — mock object library * Quick Guide * The Mock Class * Calling * Deleting Attributes * Mock names and the name attribute * Attaching Mocks as Attributes * The patchers * patch * patch.object * patch.dict * patch.multiple * patch methods: start and stop * patch builtins * TEST_PREFIX * Nesting Patch Decorators * Where to patch * Patching Descriptors and Proxy Objects * MagicMock and magic method support * Mocking Magic Methods * Magic Mock * Helpers * sentinel * DEFAULT * call * create_autospec * ANY * FILTER_DIR * mock_open * Autospeccing * Sealing mocks * Order of precedence of "side_effect", "return_value" and *wraps* * "unittest.mock" — getting started * Using Mock * Mock Patching Methods * Mock for Method Calls on an Object * Mocking Classes * Naming your mocks * Tracking all Calls * Setting Return Values and Attributes * Raising exceptions with mocks * Side effect functions and iterables * Mocking asynchronous iterators * Mocking asynchronous context manager * Creating a Mock from an Existing Object * Using side_effect to return per file content * Patch Decorators * Further Examples * Mocking chained calls * Partial mocking * Mocking a Generator Method * Applying the same patch to every test method * Mocking Unbound Methods * Checking multiple calls with mock * Coping with mutable arguments * Nesting Patches * Mocking a dictionary with MagicMock * Mock subclasses and their attributes * Mocking imports with patch.dict * Tracking order of calls and less verbose call assertions * More complex argument matching * "test" — Regression tests package for Python * Writing Unit Tests for the "test" package * Running tests using the command-line interface * "test.support" — Utilities for the Python test suite * "test.support.socket_helper" — Utilities for socket tests * "test.support.script_helper" — Utilities for the Python execution tests * "test.support.bytecode_helper" — Support tools for testing correct bytecode generation * "test.support.threading_helper" — Utilities for threading tests * "test.support.os_helper" — Utilities for os tests * "test.support.import_helper" — Utilities for import tests * "test.support.warnings_helper" — Utilities for warnings tests Python Development Mode *********************** Added in version 3.7. The Python Development Mode introduces additional runtime checks that are too expensive to be enabled by default. It should not be more verbose than the default if the code is correct; new warnings are only emitted when an issue is detected. It can be enabled using the "-X dev" command line option or by setting the "PYTHONDEVMODE" environment variable to "1". See also Python debug build. Effects of the Python Development Mode ====================================== Enabling the Python Development Mode is similar to the following command, but with additional effects described below: PYTHONMALLOC=debug PYTHONASYNCIODEBUG=1 python -W default -X faulthandler Effects of the Python Development Mode: * Add "default" warning filter. The following warnings are shown: * "DeprecationWarning" * "ImportWarning" * "PendingDeprecationWarning" * "ResourceWarning" Normally, the above warnings are filtered by the default warning filters. It behaves as if the "-W default" command line option is used. Use the "-W error" command line option or set the "PYTHONWARNINGS" environment variable to "error" to treat warnings as errors. * Install debug hooks on memory allocators to check for: * Buffer underflow * Buffer overflow * Memory allocator API violation * Unsafe usage of the GIL See the "PyMem_SetupDebugHooks()" C function. It behaves as if the "PYTHONMALLOC" environment variable is set to "debug". To enable the Python Development Mode without installing debug hooks on memory allocators, set the "PYTHONMALLOC" environment variable to "default". * Call "faulthandler.enable()" at Python startup to install handlers for the "SIGSEGV", "SIGFPE", "SIGABRT", "SIGBUS" and "SIGILL" signals to dump the Python traceback on a crash. It behaves as if the "-X faulthandler" command line option is used or if the "PYTHONFAULTHANDLER" environment variable is set to "1". * Enable asyncio debug mode. For example, "asyncio" checks for coroutines that were not awaited and logs them. It behaves as if the "PYTHONASYNCIODEBUG" environment variable is set to "1". * Check the *encoding* and *errors* arguments for string encoding and decoding operations. Examples: "open()", "str.encode()" and "bytes.decode()". By default, for best performance, the *errors* argument is only checked at the first encoding/decoding error and the *encoding* argument is sometimes ignored for empty strings. * The "io.IOBase" destructor logs "close()" exceptions. * Set the "dev_mode" attribute of "sys.flags" to "True". The Python Development Mode does not enable the "tracemalloc" module by default, because the overhead cost (to performance and memory) would be too large. Enabling the "tracemalloc" module provides additional information on the origin of some errors. For example, "ResourceWarning" logs the traceback where the resource was allocated, and a buffer overflow error logs the traceback where the memory block was allocated. The Python Development Mode does not prevent the "-O" command line option from removing "assert" statements nor from setting "__debug__" to "False". The Python Development Mode can only be enabled at the Python startup. Its value can be read from "sys.flags.dev_mode". Changed in version 3.8: The "io.IOBase" destructor now logs "close()" exceptions. Changed in version 3.9: The *encoding* and *errors* arguments are now checked for string encoding and decoding operations. ResourceWarning Example ======================= Example of a script counting the number of lines of the text file specified in the command line: import sys def main(): fp = open(sys.argv[1]) nlines = len(fp.readlines()) print(nlines) # The file is closed implicitly if __name__ == "__main__": main() The script does not close the file explicitly. By default, Python does not emit any warning. Example using README.txt, which has 269 lines: $ python script.py README.txt 269 Enabling the Python Development Mode displays a "ResourceWarning" warning: $ python -X dev script.py README.txt 269 script.py:10: ResourceWarning: unclosed file <_io.TextIOWrapper name='README.rst' mode='r' encoding='UTF-8'> main() ResourceWarning: Enable tracemalloc to get the object allocation traceback In addition, enabling "tracemalloc" shows the line where the file was opened: $ python -X dev -X tracemalloc=5 script.py README.rst 269 script.py:10: ResourceWarning: unclosed file <_io.TextIOWrapper name='README.rst' mode='r' encoding='UTF-8'> main() Object allocated at (most recent call last): File "script.py", lineno 10 main() File "script.py", lineno 4 fp = open(sys.argv[1]) The fix is to close explicitly the file. Example using a context manager: def main(): # Close the file explicitly when exiting the with block with open(sys.argv[1]) as fp: nlines = len(fp.readlines()) print(nlines) Not closing a resource explicitly can leave a resource open for way longer than expected; it can cause severe issues upon exiting Python. It is bad in CPython, but it is even worse in PyPy. Closing resources explicitly makes an application more deterministic and more reliable. Bad file descriptor error example ================================= Script displaying the first line of itself: import os def main(): fp = open(__file__) firstline = fp.readline() print(firstline.rstrip()) os.close(fp.fileno()) # The file is closed implicitly main() By default, Python does not emit any warning: $ python script.py import os The Python Development Mode shows a "ResourceWarning" and logs a “Bad file descriptor” error when finalizing the file object: $ python -X dev script.py import os script.py:10: ResourceWarning: unclosed file <_io.TextIOWrapper name='script.py' mode='r' encoding='UTF-8'> main() ResourceWarning: Enable tracemalloc to get the object allocation traceback Exception ignored in: <_io.TextIOWrapper name='script.py' mode='r' encoding='UTF-8'> Traceback (most recent call last): File "script.py", line 10, in main() OSError: [Errno 9] Bad file descriptor "os.close(fp.fileno())" closes the file descriptor. When the file object finalizer tries to close the file descriptor again, it fails with the "Bad file descriptor" error. A file descriptor must be closed only once. In the worst case scenario, closing it twice can lead to a crash (see bpo-18748 for an example). The fix is to remove the "os.close(fp.fileno())" line, or open the file with "closefd=False". Tkinter Dialogs *************** "tkinter.simpledialog" — Standard Tkinter input dialogs ======================================================= **Source code:** Lib/tkinter/simpledialog.py ====================================================================== The "tkinter.simpledialog" module contains convenience classes and functions for creating simple modal dialogs to get a value from the user. tkinter.simpledialog.askfloat(title, prompt, **kw) tkinter.simpledialog.askinteger(title, prompt, **kw) tkinter.simpledialog.askstring(title, prompt, **kw) The above three functions provide dialogs that prompt the user to enter a value of the desired type. class tkinter.simpledialog.Dialog(parent, title=None) The base class for custom dialogs. body(master) Override to construct the dialog’s interface and return the widget that should have initial focus. buttonbox() Default behaviour adds OK and Cancel buttons. Override for custom button layouts. "tkinter.filedialog" — File selection dialogs ============================================= **Source code:** Lib/tkinter/filedialog.py ====================================================================== The "tkinter.filedialog" module provides classes and factory functions for creating file/directory selection windows. Native Load/Save Dialogs ------------------------ The following classes and functions provide file dialog windows that combine a native look-and-feel with configuration options to customize behaviour. The following keyword arguments are applicable to the classes and functions listed below: *parent* - the window to place the dialog on top of *title* - the title of the window *initialdir* - the directory that the dialog starts in *initialfile* - the file selected upon opening of the dialog *filetypes* - a sequence of (label, pattern) tuples, ‘*’ wildcard is allowed *defaultextension* - default extension to append to file (save dialogs) *multiple* - when true, selection of multiple items is allowed **Static factory functions** The below functions when called create a modal, native look-and-feel dialog, wait for the user’s selection, then return the selected value(s) or "None" to the caller. tkinter.filedialog.askopenfile(mode='r', **options) tkinter.filedialog.askopenfiles(mode='r', **options) The above two functions create an "Open" dialog and return the opened file object(s) in read-only mode. tkinter.filedialog.asksaveasfile(mode='w', **options) Create a "SaveAs" dialog and return a file object opened in write- only mode. tkinter.filedialog.askopenfilename(**options) tkinter.filedialog.askopenfilenames(**options) The above two functions create an "Open" dialog and return the selected filename(s) that correspond to existing file(s). tkinter.filedialog.asksaveasfilename(**options) Create a "SaveAs" dialog and return the selected filename. tkinter.filedialog.askdirectory(**options) Prompt user to select a directory. Additional keyword option: *mustexist* - determines if selection must be an existing directory. class tkinter.filedialog.Open(master=None, **options) class tkinter.filedialog.SaveAs(master=None, **options) The above two classes provide native dialog windows for saving and loading files. **Convenience classes** The below classes are used for creating file/directory windows from scratch. These do not emulate the native look-and-feel of the platform. class tkinter.filedialog.Directory(master=None, **options) Create a dialog prompting the user to select a directory. Note: The *FileDialog* class should be subclassed for custom event handling and behaviour. class tkinter.filedialog.FileDialog(master, title=None) Create a basic file selection dialog. cancel_command(event=None) Trigger the termination of the dialog window. dirs_double_event(event) Event handler for double-click event on directory. dirs_select_event(event) Event handler for click event on directory. files_double_event(event) Event handler for double-click event on file. files_select_event(event) Event handler for single-click event on file. filter_command(event=None) Filter the files by directory. get_filter() Retrieve the file filter currently in use. get_selection() Retrieve the currently selected item. go(dir_or_file=os.curdir, pattern='*', default='', key=None) Render dialog and start event loop. ok_event(event) Exit dialog returning current selection. quit(how=None) Exit dialog returning filename, if any. set_filter(dir, pat) Set the file filter. set_selection(file) Update the current file selection to *file*. class tkinter.filedialog.LoadFileDialog(master, title=None) A subclass of FileDialog that creates a dialog window for selecting an existing file. ok_command() Test that a file is provided and that the selection indicates an already existing file. class tkinter.filedialog.SaveFileDialog(master, title=None) A subclass of FileDialog that creates a dialog window for selecting a destination file. ok_command() Test whether or not the selection points to a valid file that is not a directory. Confirmation is required if an already existing file is selected. "tkinter.commondialog" — Dialog window templates ================================================ **Source code:** Lib/tkinter/commondialog.py ====================================================================== The "tkinter.commondialog" module provides the "Dialog" class that is the base class for dialogs defined in other supporting modules. class tkinter.commondialog.Dialog(master=None, **options) show(color=None, **options) Render the Dialog window. See also: Modules "tkinter.messagebox", Reading and Writing Files "difflib" — Helpers for computing deltas **************************************** **Source code:** Lib/difflib.py ====================================================================== This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the "filecmp" module. class difflib.SequenceMatcher This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are *hashable*. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980’s by Ratcliff and Obershelp under the hyperbolic name “gestalt pattern matching.” The idea is to find the longest contiguous matching subsequence that contains no “junk” elements; these “junk” elements are ones that are uninteresting in some sense, such as blank lines or whitespace. (Handling junk is an extension to the Ratcliff and Obershelp algorithm.) The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people. **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst case and quadratic time in the expected case. "SequenceMatcher" is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear. **Automatic junk heuristic:** "SequenceMatcher" supports a heuristic that automatically treats certain sequence items as junk. The heuristic counts how many times each individual item appears in the sequence. If an item’s duplicates (after the first one) account for more than 1% of the sequence and the sequence is at least 200 items long, this item is marked as “popular” and is treated as junk for the purpose of sequence matching. This heuristic can be turned off by setting the "autojunk" argument to "False" when creating the "SequenceMatcher". Changed in version 3.2: Added the *autojunk* parameter. class difflib.Differ This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses "SequenceMatcher" both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines. Each line of a "Differ" delta begins with a two-letter code: +------------+---------------------------------------------+ | Code | Meaning | |============|=============================================| | "'- '" | line unique to sequence 1 | +------------+---------------------------------------------+ | "'+ '" | line unique to sequence 2 | +------------+---------------------------------------------+ | "' '" | line common to both sequences | +------------+---------------------------------------------+ | "'? '" | line not present in either input sequence | +------------+---------------------------------------------+ Lines beginning with ‘"?"’ attempt to guide the eye to intraline differences, and were not present in either input sequence. These lines can be confusing if the sequences contain whitespace characters, such as spaces, tabs or line breaks. class difflib.HtmlDiff This class can be used to create an HTML table (or a complete HTML file containing the table) showing a side by side, line by line comparison of text with inter-line and intra-line change highlights. The table can be generated in either full or contextual difference mode. The constructor for this class is: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK) Initializes instance of "HtmlDiff". *tabsize* is an optional keyword argument to specify tab stop spacing and defaults to "8". *wrapcolumn* is an optional keyword to specify column number where lines are broken and wrapped, defaults to "None" where lines are not wrapped. *linejunk* and *charjunk* are optional keyword arguments passed into "ndiff()" (used by "HtmlDiff" to generate the side by side HTML differences). See "ndiff()" documentation for argument default values and descriptions. The following methods are public: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5, *, charset='utf-8') Compares *fromlines* and *tolines* (lists of strings) and returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted. *fromdesc* and *todesc* are optional keyword arguments to specify from/to file column header strings (both default to an empty string). *context* and *numlines* are both optional keyword arguments. Set *context* to "True" when contextual differences are to be shown, else the default is "False" to show the full files. *numlines* defaults to "5". When *context* is "True" *numlines* controls the number of context lines which surround the difference highlights. When *context* is "False" *numlines* controls the number of lines which are shown before a difference highlight when using the “next” hyperlinks (setting to zero would cause the “next” hyperlinks to place the next difference highlight at the top of the browser without any leading context). Note: *fromdesc* and *todesc* are interpreted as unescaped HTML and should be properly escaped while receiving input from untrusted sources. Changed in version 3.5: *charset* keyword-only argument was added. The default charset of HTML document changed from "'ISO-8859-1'" to "'utf-8'". make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5) Compares *fromlines* and *tolines* (lists of strings) and returns a string which is a complete HTML table showing line by line differences with inter-line and intra-line changes highlighted. The arguments for this method are the same as those for the "make_file()" method. difflib.context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n') Compare *a* and *b* (lists of strings); return a delta (a *generator* generating the delta lines) in context diff format. Context diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a before/after style. The number of context lines is set by *n* which defaults to three. By default, the diff control lines (those with "***" or "---") are created with a trailing newline. This is helpful so that inputs created from "io.IOBase.readlines()" result in diffs that are suitable for use with "io.IOBase.writelines()" since both the inputs and outputs have trailing newlines. For inputs that do not have trailing newlines, set the *lineterm* argument to """" so that the output will be uniformly newline free. The context diff format normally has a header for filenames and modification times. Any or all of these may be specified using strings for *fromfile*, *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally expressed in the ISO 8601 format. If not specified, the strings default to blanks. >>> import sys >>> from difflib import * >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', ... tofile='after.py')) *** before.py --- after.py *************** *** 1,4 **** ! bacon ! eggs ! ham guido --- 1,4 ---- ! python ! eggy ! hamster guido See A command-line interface to difflib for a more detailed example. difflib.get_close_matches(word, possibilities, n=3, cutoff=0.6) Return a list of the best “good enough” matches. *word* is a sequence for which close matches are desired (typically a string), and *possibilities* is a list of sequences against which to match *word* (typically a list of strings). Optional argument *n* (default "3") is the maximum number of close matches to return; *n* must be greater than "0". Optional argument *cutoff* (default "0.6") is a float in the range [0, 1]. Possibilities that don’t score at least that similar to *word* are ignored. The best (no more than *n*) matches among the possibilities are returned in a list, sorted by similarity score, most similar first. >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) ['apple', 'ape'] >>> import keyword >>> get_close_matches('wheel', keyword.kwlist) ['while'] >>> get_close_matches('pineapple', keyword.kwlist) [] >>> get_close_matches('accept', keyword.kwlist) ['except'] difflib.ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK) Compare *a* and *b* (lists of strings); return a "Differ"-style delta (a *generator* generating the delta lines). Optional keyword parameters *linejunk* and *charjunk* are filtering functions (or "None"): *linejunk*: A function that accepts a single string argument, and returns true if the string is junk, or false if not. The default is "None". There is also a module-level function "IS_LINE_JUNK()", which filters out lines without visible characters, except for at most one pound character ("'#'") – however the underlying "SequenceMatcher" class does a dynamic analysis of which lines are so frequent as to constitute noise, and this usually works better than using this function. *charjunk*: A function that accepts a character (a string of length 1), and returns if the character is junk, or false if not. The default is module-level function "IS_CHARACTER_JUNK()", which filters out whitespace characters (a blank or tab; it’s a bad idea to include newline in this!). >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) >>> print(''.join(diff), end="") - one ? ^ + ore ? ^ - two - three ? - + tree + emu difflib.restore(sequence, which) Return one of the two sequences that generated a delta. Given a *sequence* produced by "Differ.compare()" or "ndiff()", extract lines originating from file 1 or 2 (parameter *which*), stripping off line prefixes. Example: >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) >>> diff = list(diff) # materialize the generated delta into a list >>> print(''.join(restore(diff, 1)), end="") one two three >>> print(''.join(restore(diff, 2)), end="") ore tree emu difflib.unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n') Compare *a* and *b* (lists of strings); return a delta (a *generator* generating the delta lines) in unified diff format. Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in an inline style (instead of separate before/after blocks). The number of context lines is set by *n* which defaults to three. By default, the diff control lines (those with "---", "+++", or "@@") are created with a trailing newline. This is helpful so that inputs created from "io.IOBase.readlines()" result in diffs that are suitable for use with "io.IOBase.writelines()" since both the inputs and outputs have trailing newlines. For inputs that do not have trailing newlines, set the *lineterm* argument to """" so that the output will be uniformly newline free. The unified diff format normally has a header for filenames and modification times. Any or all of these may be specified using strings for *fromfile*, *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally expressed in the ISO 8601 format. If not specified, the strings default to blanks. >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py')) --- before.py +++ after.py @@ -1,4 +1,4 @@ -bacon -eggs -ham +python +eggy +hamster guido See A command-line interface to difflib for a more detailed example. difflib.diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n') Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a sequence of delta lines (also bytes) in the format returned by *dfunc*. *dfunc* must be a callable, typically either "unified_diff()" or "context_diff()". Allows you to compare data with unknown or inconsistent encoding. All inputs except *n* must be bytes objects, not str. Works by losslessly converting all inputs (except *n*) to str, and calling "dfunc(a, b, fromfile, tofile, fromfiledate, tofiledate, n, lineterm)". The output of *dfunc* is then converted back to bytes, so the delta lines that you receive have the same unknown/inconsistent encodings as *a* and *b*. Added in version 3.5. difflib.IS_LINE_JUNK(line) Return "True" for ignorable lines. The line *line* is ignorable if *line* is blank or contains a single "'#'", otherwise it is not ignorable. Used as a default for parameter *linejunk* in "ndiff()" in older versions. difflib.IS_CHARACTER_JUNK(ch) Return "True" for ignorable characters. The character *ch* is ignorable if *ch* is a space or tab, otherwise it is not ignorable. Used as a default for parameter *charjunk* in "ndiff()". See also: Pattern Matching: The Gestalt Approach Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This was published in Dr. Dobb’s Journal in July, 1988. SequenceMatcher Objects ======================= The "SequenceMatcher" class has this constructor: class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True) Optional argument *isjunk* must be "None" (the default) or a one- argument function that takes a sequence element and returns true if and only if the element is “junk” and should be ignored. Passing "None" for *isjunk* is equivalent to passing "lambda x: False"; in other words, no elements are ignored. For example, pass: lambda x: x in " \t" if you’re comparing lines as sequences of characters, and don’t want to synch up on blanks or hard tabs. The optional arguments *a* and *b* are sequences to be compared; both default to empty strings. The elements of both sequences must be *hashable*. The optional argument *autojunk* can be used to disable the automatic junk heuristic. Changed in version 3.2: Added the *autojunk* parameter. SequenceMatcher objects get three data attributes: *bjunk* is the set of elements of *b* for which *isjunk* is "True"; *bpopular* is the set of non-junk elements considered popular by the heuristic (if it is not disabled); *b2j* is a dict mapping the remaining elements of *b* to a list of positions where they occur. All three are reset whenever *b* is reset with "set_seqs()" or "set_seq2()". Added in version 3.2: The *bjunk* and *bpopular* attributes. "SequenceMatcher" objects have the following methods: set_seqs(a, b) Set the two sequences to be compared. "SequenceMatcher" computes and caches detailed information about the second sequence, so if you want to compare one sequence against many sequences, use "set_seq2()" to set the commonly used sequence once and call "set_seq1()" repeatedly, once for each of the other sequences. set_seq1(a) Set the first sequence to be compared. The second sequence to be compared is not changed. set_seq2(b) Set the second sequence to be compared. The first sequence to be compared is not changed. find_longest_match(alo=0, ahi=None, blo=0, bhi=None) Find longest matching block in "a[alo:ahi]" and "b[blo:bhi]". If *isjunk* was omitted or "None", "find_longest_match()" returns "(i, j, k)" such that "a[i:i+k]" is equal to "b[j:j+k]", where "alo <= i <= i+k <= ahi" and "blo <= j <= j+k <= bhi". For all "(i', j', k')" meeting those conditions, the additional conditions "k >= k'", "i <= i'", and if "i == i'", "j <= j'" are also met. In other words, of all maximal matching blocks, return one that starts earliest in *a*, and of all those maximal matching blocks that start earliest in *a*, return the one that starts earliest in *b*. >>> s = SequenceMatcher(None, " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) Match(a=0, b=4, size=5) If *isjunk* was provided, first the longest matching block is determined as above, but with the additional restriction that no junk element appears in the block. Then that block is extended as far as possible by matching (only) junk elements on both sides. So the resulting block never matches on junk except as identical junk happens to be adjacent to an interesting match. Here’s the same example as before, but considering blanks to be junk. That prevents "' abcd'" from matching the "' abcd'" at the tail end of the second sequence directly. Instead only the "'abcd'" can match, and matches the leftmost "'abcd'" in the second sequence: >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) Match(a=1, b=0, size=4) If no blocks match, this returns "(alo, blo, 0)". This method returns a *named tuple* "Match(a, b, size)". Changed in version 3.9: Added default arguments. get_matching_blocks() Return list of triples describing non-overlapping matching subsequences. Each triple is of the form "(i, j, n)", and means that "a[i:i+n] == b[j:j+n]". The triples are monotonically increasing in *i* and *j*. The last triple is a dummy, and has the value "(len(a), len(b), 0)". It is the only triple with "n == 0". If "(i, j, n)" and "(i', j', n')" are adjacent triples in the list, and the second is not the last triple in the list, then "i+n < i'" or "j+n < j'"; in other words, adjacent triples always describe non- adjacent equal blocks. >>> s = SequenceMatcher(None, "abxcd", "abcd") >>> s.get_matching_blocks() [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] get_opcodes() Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is of the form "(tag, i1, i2, j1, j2)". The first tuple has "i1 == j1 == 0", and remaining tuples have *i1* equal to the *i2* from the preceding tuple, and, likewise, *j1* equal to the previous *j2*. The *tag* values are strings, with these meanings: +-----------------+-----------------------------------------------+ | Value | Meaning | |=================|===============================================| | "'replace'" | "a[i1:i2]" should be replaced by "b[j1:j2]". | +-----------------+-----------------------------------------------+ | "'delete'" | "a[i1:i2]" should be deleted. Note that "j1 | | | == j2" in this case. | +-----------------+-----------------------------------------------+ | "'insert'" | "b[j1:j2]" should be inserted at "a[i1:i1]". | | | Note that "i1 == i2" in this case. | +-----------------+-----------------------------------------------+ | "'equal'" | "a[i1:i2] == b[j1:j2]" (the sub-sequences are | | | equal). | +-----------------+-----------------------------------------------+ For example: >>> a = "qabxcd" >>> b = "abycdf" >>> s = SequenceMatcher(None, a, b) >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): ... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format( ... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2])) delete a[0:1] --> b[0:0] 'q' --> '' equal a[1:3] --> b[0:2] 'ab' --> 'ab' replace a[3:4] --> b[2:3] 'x' --> 'y' equal a[4:6] --> b[3:5] 'cd' --> 'cd' insert a[6:6] --> b[5:6] '' --> 'f' get_grouped_opcodes(n=3) Return a *generator* of groups with up to *n* lines of context. Starting with the groups returned by "get_opcodes()", this method splits out smaller change clusters and eliminates intervening ranges which have no changes. The groups are returned in the same format as "get_opcodes()". ratio() Return a measure of the sequences’ similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is "1.0" if the sequences are identical, and "0.0" if they have nothing in common. This is expensive to compute if "get_matching_blocks()" or "get_opcodes()" hasn’t already been called, in which case you may want to try "quick_ratio()" or "real_quick_ratio()" first to get an upper bound. Note: Caution: The result of a "ratio()" call may depend on the order of the arguments. For instance: >>> SequenceMatcher(None, 'tide', 'diet').ratio() 0.25 >>> SequenceMatcher(None, 'diet', 'tide').ratio() 0.5 quick_ratio() Return an upper bound on "ratio()" relatively quickly. real_quick_ratio() Return an upper bound on "ratio()" very quickly. The three methods that return the ratio of matching to total characters can give different results due to differing levels of approximation, although "quick_ratio()" and "real_quick_ratio()" are always at least as large as "ratio()": >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.quick_ratio() 0.75 >>> s.real_quick_ratio() 1.0 SequenceMatcher Examples ======================== This example compares two strings, considering blanks to be “junk”: >>> s = SequenceMatcher(lambda x: x == " ", ... "private Thread currentThread;", ... "private volatile Thread currentThread;") "ratio()" returns a float in [0, 1], measuring the similarity of the sequences. As a rule of thumb, a "ratio()" value over 0.6 means the sequences are close matches: >>> print(round(s.ratio(), 3)) 0.866 If you’re only interested in where the sequences match, "get_matching_blocks()" is handy: >>> for block in s.get_matching_blocks(): ... print("a[%d] and b[%d] match for %d elements" % block) a[0] and b[0] match for 8 elements a[8] and b[17] match for 21 elements a[29] and b[38] match for 0 elements Note that the last tuple returned by "get_matching_blocks()" is always a dummy, "(len(a), len(b), 0)", and this is the only case in which the last tuple element (number of elements matched) is "0". If you want to know how to change the first sequence into the second, use "get_opcodes()": >>> for opcode in s.get_opcodes(): ... print("%6s a[%d:%d] b[%d:%d]" % opcode) equal a[0:8] b[0:8] insert a[8:8] b[8:17] equal a[8:29] b[17:38] See also: * The "get_close_matches()" function in this module which shows how simple code building on "SequenceMatcher" can be used to do useful work. * Simple version control recipe for a small application built with "SequenceMatcher". Differ Objects ============== Note that "Differ"-generated deltas make no claim to be **minimal** diffs. To the contrary, minimal diffs are often counter-intuitive, because they synch up anywhere possible, sometimes accidental matches 100 pages apart. Restricting synch points to contiguous matches preserves some notion of locality, at the occasional cost of producing a longer diff. The "Differ" class has this constructor: class difflib.Differ(linejunk=None, charjunk=None) Optional keyword parameters *linejunk* and *charjunk* are for filter functions (or "None"): *linejunk*: A function that accepts a single string argument, and returns true if the string is junk. The default is "None", meaning that no line is considered junk. *charjunk*: A function that accepts a single character argument (a string of length 1), and returns true if the character is junk. The default is "None", meaning that no character is considered junk. These junk-filtering functions speed up matching to find differences and do not cause any differing lines or characters to be ignored. Read the description of the "find_longest_match()" method’s *isjunk* parameter for an explanation. "Differ" objects are used (deltas generated) via a single method: compare(a, b) Compare two sequences of lines, and generate the delta (a sequence of lines). Each sequence must contain individual single-line strings ending with newlines. Such sequences can be obtained from the "readlines()" method of file-like objects. The delta generated also consists of newline-terminated strings, ready to be printed as-is via the "writelines()" method of a file-like object. Differ Example ============== This example compares two texts. First we set up the texts, sequences of individual single-line strings ending with newlines (such sequences can also be obtained from the "readlines()" method of file-like objects): >>> text1 = ''' 1. Beautiful is better than ugly. ... 2. Explicit is better than implicit. ... 3. Simple is better than complex. ... 4. Complex is better than complicated. ... '''.splitlines(keepends=True) >>> len(text1) 4 >>> text1[0][-1] '\n' >>> text2 = ''' 1. Beautiful is better than ugly. ... 3. Simple is better than complex. ... 4. Complicated is better than complex. ... 5. Flat is better than nested. ... '''.splitlines(keepends=True) Next we instantiate a Differ object: >>> d = Differ() Note that when instantiating a "Differ" object we may pass functions to filter out line and character “junk.” See the "Differ()" constructor for details. Finally, we compare the two: >>> result = list(d.compare(text1, text2)) "result" is a list of strings, so let’s pretty-print it: >>> from pprint import pprint >>> pprint(result) [' 1. Beautiful is better than ugly.\n', '- 2. Explicit is better than implicit.\n', '- 3. Simple is better than complex.\n', '+ 3. Simple is better than complex.\n', '? ++\n', '- 4. Complex is better than complicated.\n', '? ^ ---- ^\n', '+ 4. Complicated is better than complex.\n', '? ++++ ^ ^\n', '+ 5. Flat is better than nested.\n'] As a single multi-line string it looks like this: >>> import sys >>> sys.stdout.writelines(result) 1. Beautiful is better than ugly. - 2. Explicit is better than implicit. - 3. Simple is better than complex. + 3. Simple is better than complex. ? ++ - 4. Complex is better than complicated. ? ^ ---- ^ + 4. Complicated is better than complex. ? ++++ ^ ^ + 5. Flat is better than nested. A command-line interface to difflib =================================== This example shows how to use difflib to create a "diff"-like utility. """ Command line interface to difflib.py providing diffs in four formats: * ndiff: lists every line and highlights interline changes. * context: highlights clusters of changes in a before/after format. * unified: highlights clusters of changes in an inline format. * html: generates side by side comparison with change highlights. """ import sys, os, difflib, argparse from datetime import datetime, timezone def file_mtime(path): t = datetime.fromtimestamp(os.stat(path).st_mtime, timezone.utc) return t.astimezone().isoformat() def main(): parser = argparse.ArgumentParser() parser.add_argument('-c', action='store_true', default=False, help='Produce a context format diff (default)') parser.add_argument('-u', action='store_true', default=False, help='Produce a unified format diff') parser.add_argument('-m', action='store_true', default=False, help='Produce HTML side by side diff ' '(can use -c and -l in conjunction)') parser.add_argument('-n', action='store_true', default=False, help='Produce a ndiff format diff') parser.add_argument('-l', '--lines', type=int, default=3, help='Set number of context lines (default 3)') parser.add_argument('fromfile') parser.add_argument('tofile') options = parser.parse_args() n = options.lines fromfile = options.fromfile tofile = options.tofile fromdate = file_mtime(fromfile) todate = file_mtime(tofile) with open(fromfile) as ff: fromlines = ff.readlines() with open(tofile) as tf: tolines = tf.readlines() if options.u: diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n) elif options.n: diff = difflib.ndiff(fromlines, tolines) elif options.m: diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile,context=options.c,numlines=n) else: diff = difflib.context_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n) sys.stdout.writelines(diff) if __name__ == '__main__': main() ndiff example ============= This example shows how to use "difflib.ndiff()". """ndiff [-q] file1 file2 or ndiff (-r1 | -r2) < ndiff_output > file1_or_file2 Print a human-friendly file difference report to stdout. Both inter- and intra-line differences are noted. In the second form, recreate file1 (-r1) or file2 (-r2) on stdout, from an ndiff report on stdin. In the first form, if -q ("quiet") is not specified, the first two lines of output are -: file1 +: file2 Each remaining line begins with a two-letter code: "- " line unique to file1 "+ " line unique to file2 " " line common to both files "? " line not present in either input file Lines beginning with "? " attempt to guide the eye to intraline differences, and were not present in either input file. These lines can be confusing if the source files contain tab characters. The first file can be recovered by retaining only lines that begin with " " or "- ", and deleting those 2-character prefixes; use ndiff with -r1. The second file can be recovered similarly, but by retaining only " " and "+ " lines; use ndiff with -r2; or, on Unix, the second file can be recovered by piping the output through sed -n '/^[+ ] /s/^..//p' """ __version__ = 1, 7, 0 import difflib, sys def fail(msg): out = sys.stderr.write out(msg + "\n\n") out(__doc__) return 0 # open a file & return the file object; gripe and return 0 if it # couldn't be opened def fopen(fname): try: return open(fname) except IOError as detail: return fail("couldn't open " + fname + ": " + str(detail)) # open two files & spray the diff to stdout; return false iff a problem def fcompare(f1name, f2name): f1 = fopen(f1name) f2 = fopen(f2name) if not f1 or not f2: return 0 a = f1.readlines(); f1.close() b = f2.readlines(); f2.close() for line in difflib.ndiff(a, b): print(line, end=' ') return 1 # crack args (sys.argv[1:] is normal) & compare; # return false iff a problem def main(args): import getopt try: opts, args = getopt.getopt(args, "qr:") except getopt.error as detail: return fail(str(detail)) noisy = 1 qseen = rseen = 0 for opt, val in opts: if opt == "-q": qseen = 1 noisy = 0 elif opt == "-r": rseen = 1 whichfile = val if qseen and rseen: return fail("can't specify both -q and -r") if rseen: if args: return fail("no args allowed with -r option") if whichfile in ("1", "2"): restore(whichfile) return 1 return fail("-r value must be 1 or 2") if len(args) != 2: return fail("need 2 filename args") f1name, f2name = args if noisy: print('-:', f1name) print('+:', f2name) return fcompare(f1name, f2name) # read ndiff output from stdin, and print file1 (which=='1') or # file2 (which=='2') to stdout def restore(which): restored = difflib.restore(sys.stdin.readlines(), which) sys.stdout.writelines(restored) if __name__ == '__main__': main(sys.argv[1:]) "dis" — Disassembler for Python bytecode **************************************** **Source code:** Lib/dis.py ====================================================================== The "dis" module supports the analysis of CPython *bytecode* by disassembling it. The CPython bytecode which this module takes as an input is defined in the file "Include/opcode.h" and used by the compiler and the interpreter. **CPython implementation detail:** Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Use of this module should not be considered to work across Python VMs or Python releases. Changed in version 3.6: Use 2 bytes for each instruction. Previously the number of bytes varied by instruction. Changed in version 3.10: The argument of jump, exception handling and loop instructions is now the instruction offset rather than the byte offset. Changed in version 3.11: Some instructions are accompanied by one or more inline cache entries, which take the form of "CACHE" instructions. These instructions are hidden by default, but can be shown by passing "show_caches=True" to any "dis" utility. Furthermore, the interpreter now adapts the bytecode to specialize it for different runtime conditions. The adaptive bytecode can be shown by passing "adaptive=True". Changed in version 3.12: The argument of a jump is the offset of the target instruction relative to the instruction that appears immediately after the jump instruction’s "CACHE" entries.As a consequence, the presence of the "CACHE" instructions is transparent for forward jumps but needs to be taken into account when reasoning about backward jumps. Changed in version 3.13: The output shows logical labels rather than instruction offsets for jump targets and exception handlers. The "-O" command line option and the "show_offsets" argument were added. Example: Given the function "myfunc()": def myfunc(alist): return len(alist) the following command can be used to display the disassembly of "myfunc()": >>> dis.dis(myfunc) 2 RESUME 0 3 LOAD_GLOBAL 1 (len + NULL) LOAD_FAST 0 (alist) CALL 1 RETURN_VALUE (The “2” is a line number). Command-line interface ====================== The "dis" module can be invoked as a script from the command line: python -m dis [-h] [-C] [-O] [infile] The following options are accepted: -h, --help Display usage and exit. -C, --show-caches Show inline caches. Added in version 3.13. -O, --show-offsets Show offsets of instructions. Added in version 3.13. If "infile" is specified, its disassembled code will be written to stdout. Otherwise, disassembly is performed on compiled source code received from stdin. Bytecode analysis ================= Added in version 3.4. The bytecode analysis API allows pieces of Python code to be wrapped in a "Bytecode" object that provides easy access to details of the compiled code. class dis.Bytecode(x, *, first_line=None, current_offset=None, show_caches=False, adaptive=False, show_offsets=False) Analyse the bytecode corresponding to a function, generator, asynchronous generator, coroutine, method, string of source code, or a code object (as returned by "compile()"). This is a convenience wrapper around many of the functions listed below, most notably "get_instructions()", as iterating over a "Bytecode" instance yields the bytecode operations as "Instruction" instances. If *first_line* is not "None", it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object. If *current_offset* is not "None", it refers to an instruction offset in the disassembled code. Setting this means "dis()" will display a “current instruction” marker against the specified opcode. If *show_caches* is "True", "dis()" will display inline cache entries used by the interpreter to specialize the bytecode. If *adaptive* is "True", "dis()" will display specialized bytecode that may be different from the original bytecode. If *show_offsets* is "True", "dis()" will include instruction offsets in the output. classmethod from_traceback(tb, *, show_caches=False) Construct a "Bytecode" instance from the given traceback, setting *current_offset* to the instruction responsible for the exception. codeobj The compiled code object. first_line The first source line of the code object (if available) dis() Return a formatted view of the bytecode operations (the same as printed by "dis.dis()", but returned as a multi-line string). info() Return a formatted multi-line string with detailed information about the code object, like "code_info()". Changed in version 3.7: This can now handle coroutine and asynchronous generator objects. Changed in version 3.11: Added the *show_caches* and *adaptive* parameters. Example: >>> bytecode = dis.Bytecode(myfunc) >>> for instr in bytecode: ... print(instr.opname) ... RESUME LOAD_GLOBAL LOAD_FAST CALL RETURN_VALUE Analysis functions ================== The "dis" module also defines the following analysis functions that convert the input directly to the desired output. They can be useful if only a single operation is being performed, so the intermediate analysis object isn’t useful: dis.code_info(x) Return a formatted multi-line string with detailed code object information for the supplied function, generator, asynchronous generator, coroutine, method, source code string or code object. Note that the exact contents of code info strings are highly implementation dependent and they may change arbitrarily across Python VMs or Python releases. Added in version 3.2. Changed in version 3.7: This can now handle coroutine and asynchronous generator objects. dis.show_code(x, *, file=None) Print detailed code object information for the supplied function, method, source code string or code object to *file* (or "sys.stdout" if *file* is not specified). This is a convenient shorthand for "print(code_info(x), file=file)", intended for interactive exploration at the interpreter prompt. Added in version 3.2. Changed in version 3.4: Added *file* parameter. dis.dis(x=None, *, file=None, depth=None, show_caches=False, adaptive=False) Disassemble the *x* object. *x* can denote either a module, a class, a method, a function, a generator, an asynchronous generator, a coroutine, a code object, a string of source code or a byte sequence of raw bytecode. For a module, it disassembles all functions. For a class, it disassembles all methods (including class and static methods). For a code object or sequence of raw bytecode, it prints one line per bytecode instruction. It also recursively disassembles nested code objects. These can include generator expressions, nested functions, the bodies of nested classes, and the code objects used for annotation scopes. Strings are first compiled to code objects with the "compile()" built-in function before being disassembled. If no object is provided, this function disassembles the last traceback. The disassembly is written as text to the supplied *file* argument if provided and to "sys.stdout" otherwise. The maximal depth of recursion is limited by *depth* unless it is "None". "depth=0" means no recursion. If *show_caches* is "True", this function will display inline cache entries used by the interpreter to specialize the bytecode. If *adaptive* is "True", this function will display specialized bytecode that may be different from the original bytecode. Changed in version 3.4: Added *file* parameter. Changed in version 3.7: Implemented recursive disassembling and added *depth* parameter. Changed in version 3.7: This can now handle coroutine and asynchronous generator objects. Changed in version 3.11: Added the *show_caches* and *adaptive* parameters. distb(tb=None, *, file=None, show_caches=False, adaptive=False, show_offset=False) Disassemble the top-of-stack function of a traceback, using the last traceback if none was passed. The instruction causing the exception is indicated. The disassembly is written as text to the supplied *file* argument if provided and to "sys.stdout" otherwise. Changed in version 3.4: Added *file* parameter. Changed in version 3.11: Added the *show_caches* and *adaptive* parameters. Changed in version 3.13: Added the *show_offsets* parameter. dis.disassemble(code, lasti=-1, *, file=None, show_caches=False, adaptive=False) disco(code, lasti=-1, *, file=None, show_caches=False, adaptive=False, show_offsets=False) Disassemble a code object, indicating the last instruction if *lasti* was provided. The output is divided in the following columns: 1. the line number, for the first instruction of each line 2. the current instruction, indicated as "-->", 3. a labelled instruction, indicated with ">>", 4. the address of the instruction, 5. the operation code name, 6. operation parameters, and 7. interpretation of the parameters in parentheses. The parameter interpretation recognizes local and global variable names, constant values, branch targets, and compare operators. The disassembly is written as text to the supplied *file* argument if provided and to "sys.stdout" otherwise. Changed in version 3.4: Added *file* parameter. Changed in version 3.11: Added the *show_caches* and *adaptive* parameters. Changed in version 3.13: Added the *show_offsets* parameter. dis.get_instructions(x, *, first_line=None, show_caches=False, adaptive=False) Return an iterator over the instructions in the supplied function, method, source code string or code object. The iterator generates a series of "Instruction" named tuples giving the details of each operation in the supplied code. If *first_line* is not "None", it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object. The *adaptive* parameter works as it does in "dis()". Added in version 3.4. Changed in version 3.11: Added the *show_caches* and *adaptive* parameters. Changed in version 3.13: The *show_caches* parameter is deprecated and has no effect. The iterator generates the "Instruction" instances with the *cache_info* field populated (regardless of the value of *show_caches*) and it no longer generates separate items for the cache entries. dis.findlinestarts(code) This generator function uses the "co_lines()" method of the code object *code* to find the offsets which are starts of lines in the source code. They are generated as "(offset, lineno)" pairs. Changed in version 3.6: Line numbers can be decreasing. Before, they were always increasing. Changed in version 3.10: The **PEP 626** "co_lines()" method is used instead of the "co_firstlineno" and "co_lnotab" attributes of the code object. Changed in version 3.13: Line numbers can be "None" for bytecode that does not map to source lines. dis.findlabels(code) Detect all offsets in the raw compiled bytecode string *code* which are jump targets, and return a list of these offsets. dis.stack_effect(opcode, oparg=None, *, jump=None) Compute the stack effect of *opcode* with argument *oparg*. If the code has a jump target and *jump* is "True", "stack_effect()" will return the stack effect of jumping. If *jump* is "False", it will return the stack effect of not jumping. And if *jump* is "None" (default), it will return the maximal stack effect of both cases. Added in version 3.4. Changed in version 3.8: Added *jump* parameter. Changed in version 3.13: If "oparg" is omitted (or "None"), the stack effect is now returned for "oparg=0". Previously this was an error for opcodes that use their arg. It is also no longer an error to pass an integer "oparg" when the "opcode" does not use it; the "oparg" in this case is ignored. Python Bytecode Instructions ============================ The "get_instructions()" function and "Bytecode" class provide details of bytecode instructions as "Instruction" instances: class dis.Instruction Details for a bytecode operation opcode numeric code for operation, corresponding to the opcode values listed below and the bytecode values in the Opcode collections. opname human readable name for operation baseopcode numeric code for the base operation if operation is specialized; otherwise equal to "opcode" baseopname human readable name for the base operation if operation is specialized; otherwise equal to "opname" arg numeric argument to operation (if any), otherwise "None" oparg alias for "arg" argval resolved arg value (if any), otherwise "None" argrepr human readable description of operation argument (if any), otherwise an empty string. offset start index of operation within bytecode sequence start_offset start index of operation within bytecode sequence, including prefixed "EXTENDED_ARG" operations if present; otherwise equal to "offset" cache_offset start index of the cache entries following the operation end_offset end index of the cache entries following the operation starts_line "True" if this opcode starts a source line, otherwise "False" line_number source line number associated with this opcode (if any), otherwise "None" is_jump_target "True" if other code jumps to here, otherwise "False" jump_target bytecode index of the jump target if this is a jump operation, otherwise "None" positions "dis.Positions" object holding the start and end locations that are covered by this instruction. cache_info Information about the cache entries of this instruction, as triplets of the form "(name, size, data)", where the "name" and "size" describe the cache format and data is the contents of the cache. "cache_info" is "None" if the instruction does not have caches. Added in version 3.4. Changed in version 3.11: Field "positions" is added. Changed in version 3.13: Changed field "starts_line".Added fields "start_offset", "cache_offset", "end_offset", "baseopname", "baseopcode", "jump_target", "oparg", "line_number" and "cache_info". class dis.Positions In case the information is not available, some fields might be "None". lineno end_lineno col_offset end_col_offset Added in version 3.11. The Python compiler currently generates the following bytecode instructions. **General instructions** In the following, We will refer to the interpreter stack as "STACK" and describe operations on it as if it was a Python list. The top of the stack corresponds to "STACK[-1]" in this language. NOP Do nothing code. Used as a placeholder by the bytecode optimizer, and to generate line tracing events. POP_TOP Removes the top-of-stack item: STACK.pop() END_FOR Removes the top-of-stack item. Equivalent to "POP_TOP". Used to clean up at the end of loops, hence the name. Added in version 3.12. END_SEND Implements "del STACK[-2]". Used to clean up when a generator exits. Added in version 3.12. COPY(i) Push the i-th item to the top of the stack without removing it from its original location: assert i > 0 STACK.append(STACK[-i]) Added in version 3.11. SWAP(i) Swap the top of the stack with the i-th element: STACK[-i], STACK[-1] = STACK[-1], STACK[-i] Added in version 3.11. CACHE Rather than being an actual instruction, this opcode is used to mark extra space for the interpreter to cache useful data directly in the bytecode itself. It is automatically hidden by all "dis" utilities, but can be viewed with "show_caches=True". Logically, this space is part of the preceding instruction. Many opcodes expect to be followed by an exact number of caches, and will instruct the interpreter to skip over them at runtime. Populated caches can look like arbitrary instructions, so great care should be taken when reading or modifying raw, adaptive bytecode containing quickened data. Added in version 3.11. **Unary operations** Unary operations take the top of the stack, apply the operation, and push the result back on the stack. UNARY_NEGATIVE Implements "STACK[-1] = -STACK[-1]". UNARY_NOT Implements "STACK[-1] = not STACK[-1]". Changed in version 3.13: This instruction now requires an exact "bool" operand. UNARY_INVERT Implements "STACK[-1] = ~STACK[-1]". GET_ITER Implements "STACK[-1] = iter(STACK[-1])". GET_YIELD_FROM_ITER If "STACK[-1]" is a *generator iterator* or *coroutine* object it is left as is. Otherwise, implements "STACK[-1] = iter(STACK[-1])". Added in version 3.5. TO_BOOL Implements "STACK[-1] = bool(STACK[-1])". Added in version 3.13. **Binary and in-place operations** Binary operations remove the top two items from the stack ("STACK[-1]" and "STACK[-2]"). They perform the operation, then put the result back on the stack. In-place operations are like binary operations, but the operation is done in-place when "STACK[-2]" supports it, and the resulting "STACK[-1]" may be (but does not have to be) the original "STACK[-2]". BINARY_OP(op) Implements the binary and in-place operators (depending on the value of *op*): rhs = STACK.pop() lhs = STACK.pop() STACK.append(lhs op rhs) Added in version 3.11. BINARY_SUBSCR Implements: key = STACK.pop() container = STACK.pop() STACK.append(container[key]) STORE_SUBSCR Implements: key = STACK.pop() container = STACK.pop() value = STACK.pop() container[key] = value DELETE_SUBSCR Implements: key = STACK.pop() container = STACK.pop() del container[key] BINARY_SLICE Implements: end = STACK.pop() start = STACK.pop() container = STACK.pop() STACK.append(container[start:end]) Added in version 3.12. STORE_SLICE Implements: end = STACK.pop() start = STACK.pop() container = STACK.pop() values = STACK.pop() container[start:end] = value Added in version 3.12. **Coroutine opcodes** GET_AWAITABLE(where) Implements "STACK[-1] = get_awaitable(STACK[-1])", where "get_awaitable(o)" returns "o" if "o" is a coroutine object or a generator object with the "CO_ITERABLE_COROUTINE" flag, or resolves "o.__await__". If the "where" operand is nonzero, it indicates where the instruction occurs: * "1": After a call to "__aenter__" * "2": After a call to "__aexit__" Added in version 3.5. Changed in version 3.11: Previously, this instruction did not have an oparg. GET_AITER Implements "STACK[-1] = STACK[-1].__aiter__()". Added in version 3.5. Changed in version 3.7: Returning awaitable objects from "__aiter__" is no longer supported. GET_ANEXT Implement "STACK.append(get_awaitable(STACK[-1].__anext__()))" to the stack. See "GET_AWAITABLE" for details about "get_awaitable". Added in version 3.5. END_ASYNC_FOR Terminates an "async for" loop. Handles an exception raised when awaiting a next item. The stack contains the async iterable in "STACK[-2]" and the raised exception in "STACK[-1]". Both are popped. If the exception is not "StopAsyncIteration", it is re- raised. Added in version 3.8. Changed in version 3.11: Exception representation on the stack now consist of one, not three, items. CLEANUP_THROW Handles an exception raised during a "throw()" or "close()" call through the current frame. If "STACK[-1]" is an instance of "StopIteration", pop three values from the stack and push its "value" member. Otherwise, re-raise "STACK[-1]". Added in version 3.12. BEFORE_ASYNC_WITH Resolves "__aenter__" and "__aexit__" from "STACK[-1]". Pushes "__aexit__" and result of "__aenter__()" to the stack: STACK.extend((__aexit__, __aenter__()) Added in version 3.5. **Miscellaneous opcodes** SET_ADD(i) Implements: item = STACK.pop() set.add(STACK[-i], item) Used to implement set comprehensions. LIST_APPEND(i) Implements: item = STACK.pop() list.append(STACK[-i], item) Used to implement list comprehensions. MAP_ADD(i) Implements: value = STACK.pop() key = STACK.pop() dict.__setitem__(STACK[-i], key, value) Used to implement dict comprehensions. Added in version 3.1. Changed in version 3.8: Map value is "STACK[-1]" and map key is "STACK[-2]". Before, those were reversed. For all of the "SET_ADD", "LIST_APPEND" and "MAP_ADD" instructions, while the added value or key/value pair is popped off, the container object remains on the stack so that it is available for further iterations of the loop. RETURN_VALUE Returns with "STACK[-1]" to the caller of the function. RETURN_CONST(consti) Returns with "co_consts[consti]" to the caller of the function. Added in version 3.12. YIELD_VALUE Yields "STACK.pop()" from a *generator*. Changed in version 3.11: oparg set to be the stack depth. Changed in version 3.12: oparg set to be the exception block depth, for efficient closing of generators. Changed in version 3.13: oparg is "1" if this instruction is part of a yield-from or await, and "0" otherwise. SETUP_ANNOTATIONS Checks whether "__annotations__" is defined in "locals()", if not it is set up to an empty "dict". This opcode is only emitted if a class or module body contains *variable annotations* statically. Added in version 3.6. POP_EXCEPT Pops a value from the stack, which is used to restore the exception state. Changed in version 3.11: Exception representation on the stack now consist of one, not three, items. RERAISE Re-raises the exception currently on top of the stack. If oparg is non-zero, pops an additional value from the stack which is used to set "f_lasti" of the current frame. Added in version 3.9. Changed in version 3.11: Exception representation on the stack now consist of one, not three, items. PUSH_EXC_INFO Pops a value from the stack. Pushes the current exception to the top of the stack. Pushes the value originally popped back to the stack. Used in exception handlers. Added in version 3.11. CHECK_EXC_MATCH Performs exception matching for "except". Tests whether the "STACK[-2]" is an exception matching "STACK[-1]". Pops "STACK[-1]" and pushes the boolean result of the test. Added in version 3.11. CHECK_EG_MATCH Performs exception matching for "except*". Applies "split(STACK[-1])" on the exception group representing "STACK[-2]". In case of a match, pops two items from the stack and pushes the non-matching subgroup ("None" in case of full match) followed by the matching subgroup. When there is no match, pops one item (the match type) and pushes "None". Added in version 3.11. WITH_EXCEPT_START Calls the function in position 4 on the stack with arguments (type, val, tb) representing the exception at the top of the stack. Used to implement the call "context_manager.__exit__(*exc_info())" when an exception has occurred in a "with" statement. Added in version 3.9. Changed in version 3.11: The "__exit__" function is in position 4 of the stack rather than 7. Exception representation on the stack now consist of one, not three, items. LOAD_ASSERTION_ERROR Pushes "AssertionError" onto the stack. Used by the "assert" statement. Added in version 3.9. LOAD_BUILD_CLASS Pushes "builtins.__build_class__()" onto the stack. It is later called to construct a class. BEFORE_WITH This opcode performs several operations before a with block starts. First, it loads "__exit__()" from the context manager and pushes it onto the stack for later use by "WITH_EXCEPT_START". Then, "__enter__()" is called. Finally, the result of calling the "__enter__()" method is pushed onto the stack. Added in version 3.11. GET_LEN Perform "STACK.append(len(STACK[-1]))". Used in "match" statements where comparison with structure of pattern is needed. Added in version 3.10. MATCH_MAPPING If "STACK[-1]" is an instance of "collections.abc.Mapping" (or, more technically: if it has the "Py_TPFLAGS_MAPPING" flag set in its "tp_flags"), push "True" onto the stack. Otherwise, push "False". Added in version 3.10. MATCH_SEQUENCE If "STACK[-1]" is an instance of "collections.abc.Sequence" and is *not* an instance of "str"/"bytes"/"bytearray" (or, more technically: if it has the "Py_TPFLAGS_SEQUENCE" flag set in its "tp_flags"), push "True" onto the stack. Otherwise, push "False". Added in version 3.10. MATCH_KEYS "STACK[-1]" is a tuple of mapping keys, and "STACK[-2]" is the match subject. If "STACK[-2]" contains all of the keys in "STACK[-1]", push a "tuple" containing the corresponding values. Otherwise, push "None". Added in version 3.10. Changed in version 3.11: Previously, this instruction also pushed a boolean value indicating success ("True") or failure ("False"). STORE_NAME(namei) Implements "name = STACK.pop()". *namei* is the index of *name* in the attribute "co_names" of the code object. The compiler tries to use "STORE_FAST" or "STORE_GLOBAL" if possible. DELETE_NAME(namei) Implements "del name", where *namei* is the index into "co_names" attribute of the code object. UNPACK_SEQUENCE(count) Unpacks "STACK[-1]" into *count* individual values, which are put onto the stack right-to-left. Require there to be exactly *count* values.: assert(len(STACK[-1]) == count) STACK.extend(STACK.pop()[:-count-1:-1]) UNPACK_EX(counts) Implements assignment with a starred target: Unpacks an iterable in "STACK[-1]" into individual values, where the total number of values can be smaller than the number of items in the iterable: one of the new values will be a list of all leftover items. The number of values before and after the list value is limited to 255. The number of values before the list value is encoded in the argument of the opcode. The number of values after the list if any is encoded using an "EXTENDED_ARG". As a consequence, the argument can be seen as a two bytes values where the low byte of *counts* is the number of values before the list value, the high byte of *counts* the number of values after it. The extracted values are put onto the stack right-to-left, i.e. "a, *b, c = d" will be stored after execution as "STACK.extend((a, b, c))". STORE_ATTR(namei) Implements: obj = STACK.pop() value = STACK.pop() obj.name = value where *namei* is the index of name in "co_names" of the code object. DELETE_ATTR(namei) Implements: obj = STACK.pop() del obj.name where *namei* is the index of name into "co_names" of the code object. STORE_GLOBAL(namei) Works as "STORE_NAME", but stores the name as a global. DELETE_GLOBAL(namei) Works as "DELETE_NAME", but deletes a global name. LOAD_CONST(consti) Pushes "co_consts[consti]" onto the stack. LOAD_NAME(namei) Pushes the value associated with "co_names[namei]" onto the stack. The name is looked up within the locals, then the globals, then the builtins. LOAD_LOCALS Pushes a reference to the locals dictionary onto the stack. This is used to prepare namespace dictionaries for "LOAD_FROM_DICT_OR_DEREF" and "LOAD_FROM_DICT_OR_GLOBALS". Added in version 3.12. LOAD_FROM_DICT_OR_GLOBALS(i) Pops a mapping off the stack and looks up the value for "co_names[namei]". If the name is not found there, looks it up in the globals and then the builtins, similar to "LOAD_GLOBAL". This is used for loading global variables in annotation scopes within class bodies. Added in version 3.12. BUILD_TUPLE(count) Creates a tuple consuming *count* items from the stack, and pushes the resulting tuple onto the stack: if count == 0: value = () else: value = tuple(STACK[-count:]) STACK = STACK[:-count] STACK.append(value) BUILD_LIST(count) Works as "BUILD_TUPLE", but creates a list. BUILD_SET(count) Works as "BUILD_TUPLE", but creates a set. BUILD_MAP(count) Pushes a new dictionary object onto the stack. Pops "2 * count" items so that the dictionary holds *count* entries: "{..., STACK[-4]: STACK[-3], STACK[-2]: STACK[-1]}". Changed in version 3.5: The dictionary is created from stack items instead of creating an empty dictionary pre-sized to hold *count* items. BUILD_CONST_KEY_MAP(count) The version of "BUILD_MAP" specialized for constant keys. Pops the top element on the stack which contains a tuple of keys, then starting from "STACK[-2]", pops *count* values to form values in the built dictionary. Added in version 3.6. BUILD_STRING(count) Concatenates *count* strings from the stack and pushes the resulting string onto the stack. Added in version 3.6. LIST_EXTEND(i) Implements: seq = STACK.pop() list.extend(STACK[-i], seq) Used to build lists. Added in version 3.9. SET_UPDATE(i) Implements: seq = STACK.pop() set.update(STACK[-i], seq) Used to build sets. Added in version 3.9. DICT_UPDATE(i) Implements: map = STACK.pop() dict.update(STACK[-i], map) Used to build dicts. Added in version 3.9. DICT_MERGE(i) Like "DICT_UPDATE" but raises an exception for duplicate keys. Added in version 3.9. LOAD_ATTR(namei) If the low bit of "namei" is not set, this replaces "STACK[-1]" with "getattr(STACK[-1], co_names[namei>>1])". If the low bit of "namei" is set, this will attempt to load a method named "co_names[namei>>1]" from the "STACK[-1]" object. "STACK[-1]" is popped. This bytecode distinguishes two cases: if "STACK[-1]" has a method with the correct name, the bytecode pushes the unbound method and "STACK[-1]". "STACK[-1]" will be used as the first argument ("self") by "CALL" or "CALL_KW" when calling the unbound method. Otherwise, "NULL" and the object returned by the attribute lookup are pushed. Changed in version 3.12: If the low bit of "namei" is set, then a "NULL" or "self" is pushed to the stack before the attribute or unbound method respectively. LOAD_SUPER_ATTR(namei) This opcode implements "super()", both in its zero-argument and two-argument forms (e.g. "super().method()", "super().attr" and "super(cls, self).method()", "super(cls, self).attr"). It pops three values from the stack (from top of stack down): * "self": the first argument to the current method * "cls": the class within which the current method was defined * the global "super" With respect to its argument, it works similarly to "LOAD_ATTR", except that "namei" is shifted left by 2 bits instead of 1. The low bit of "namei" signals to attempt a method load, as with "LOAD_ATTR", which results in pushing "NULL" and the loaded method. When it is unset a single value is pushed to the stack. The second-low bit of "namei", if set, means that this was a two- argument call to "super()" (unset means zero-argument). Added in version 3.12. COMPARE_OP(opname) Performs a Boolean operation. The operation name can be found in "cmp_op[opname >> 5]". If the fifth-lowest bit of "opname" is set ("opname & 16"), the result should be coerced to "bool". Changed in version 3.13: The fifth-lowest bit of the oparg now indicates a forced conversion to "bool". IS_OP(invert) Performs "is" comparison, or "is not" if "invert" is 1. Added in version 3.9. CONTAINS_OP(invert) Performs "in" comparison, or "not in" if "invert" is 1. Added in version 3.9. IMPORT_NAME(namei) Imports the module "co_names[namei]". "STACK[-1]" and "STACK[-2]" are popped and provide the *fromlist* and *level* arguments of "__import__()". The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent "STORE_FAST" instruction modifies the namespace. IMPORT_FROM(namei) Loads the attribute "co_names[namei]" from the module found in "STACK[-1]". The resulting object is pushed onto the stack, to be subsequently stored by a "STORE_FAST" instruction. JUMP_FORWARD(delta) Increments bytecode counter by *delta*. JUMP_BACKWARD(delta) Decrements bytecode counter by *delta*. Checks for interrupts. Added in version 3.11. JUMP_BACKWARD_NO_INTERRUPT(delta) Decrements bytecode counter by *delta*. Does not check for interrupts. Added in version 3.11. POP_JUMP_IF_TRUE(delta) If "STACK[-1]" is true, increments the bytecode counter by *delta*. "STACK[-1]" is popped. Changed in version 3.11: The oparg is now a relative delta rather than an absolute target. This opcode is a pseudo-instruction, replaced in final bytecode by the directed versions (forward/backward). Changed in version 3.12: This is no longer a pseudo-instruction. Changed in version 3.13: This instruction now requires an exact "bool" operand. POP_JUMP_IF_FALSE(delta) If "STACK[-1]" is false, increments the bytecode counter by *delta*. "STACK[-1]" is popped. Changed in version 3.11: The oparg is now a relative delta rather than an absolute target. This opcode is a pseudo-instruction, replaced in final bytecode by the directed versions (forward/backward). Changed in version 3.12: This is no longer a pseudo-instruction. Changed in version 3.13: This instruction now requires an exact "bool" operand. POP_JUMP_IF_NOT_NONE(delta) If "STACK[-1]" is not "None", increments the bytecode counter by *delta*. "STACK[-1]" is popped. Added in version 3.11. Changed in version 3.12: This is no longer a pseudo-instruction. POP_JUMP_IF_NONE(delta) If "STACK[-1]" is "None", increments the bytecode counter by *delta*. "STACK[-1]" is popped. Added in version 3.11. Changed in version 3.12: This is no longer a pseudo-instruction. FOR_ITER(delta) "STACK[-1]" is an *iterator*. Call its "__next__()" method. If this yields a new value, push it on the stack (leaving the iterator below it). If the iterator indicates it is exhausted then the byte code counter is incremented by *delta*. Changed in version 3.12: Up until 3.11 the iterator was popped when it was exhausted. LOAD_GLOBAL(namei) Loads the global named "co_names[namei>>1]" onto the stack. Changed in version 3.11: If the low bit of "namei" is set, then a "NULL" is pushed to the stack before the global variable. LOAD_FAST(var_num) Pushes a reference to the local "co_varnames[var_num]" onto the stack. Changed in version 3.12: This opcode is now only used in situations where the local variable is guaranteed to be initialized. It cannot raise "UnboundLocalError". LOAD_FAST_LOAD_FAST(var_nums) Pushes references to "co_varnames[var_nums >> 4]" and "co_varnames[var_nums & 15]" onto the stack. Added in version 3.13. LOAD_FAST_CHECK(var_num) Pushes a reference to the local "co_varnames[var_num]" onto the stack, raising an "UnboundLocalError" if the local variable has not been initialized. Added in version 3.12. LOAD_FAST_AND_CLEAR(var_num) Pushes a reference to the local "co_varnames[var_num]" onto the stack (or pushes "NULL" onto the stack if the local variable has not been initialized) and sets "co_varnames[var_num]" to "NULL". Added in version 3.12. STORE_FAST(var_num) Stores "STACK.pop()" into the local "co_varnames[var_num]". STORE_FAST_STORE_FAST(var_nums) Stores "STACK[-1]" into "co_varnames[var_nums >> 4]" and "STACK[-2]" into "co_varnames[var_nums & 15]". Added in version 3.13. STORE_FAST_LOAD_FAST(var_nums) Stores "STACK.pop()" into the local "co_varnames[var_nums >> 4]" and pushes a reference to the local "co_varnames[var_nums & 15]" onto the stack. Added in version 3.13. DELETE_FAST(var_num) Deletes local "co_varnames[var_num]". MAKE_CELL(i) Creates a new cell in slot "i". If that slot is nonempty then that value is stored into the new cell. Added in version 3.11. LOAD_DEREF(i) Loads the cell contained in slot "i" of the “fast locals” storage. Pushes a reference to the object the cell contains on the stack. Changed in version 3.11: "i" is no longer offset by the length of "co_varnames". LOAD_FROM_DICT_OR_DEREF(i) Pops a mapping off the stack and looks up the name associated with slot "i" of the “fast locals” storage in this mapping. If the name is not found there, loads it from the cell contained in slot "i", similar to "LOAD_DEREF". This is used for loading *closure variables* in class bodies (which previously used "LOAD_CLASSDEREF") and in annotation scopes within class bodies. Added in version 3.12. STORE_DEREF(i) Stores "STACK.pop()" into the cell contained in slot "i" of the “fast locals” storage. Changed in version 3.11: "i" is no longer offset by the length of "co_varnames". DELETE_DEREF(i) Empties the cell contained in slot "i" of the “fast locals” storage. Used by the "del" statement. Added in version 3.2. Changed in version 3.11: "i" is no longer offset by the length of "co_varnames". COPY_FREE_VARS(n) Copies the "n" *free (closure) variables* from the closure into the frame. Removes the need for special code on the caller’s side when calling closures. Added in version 3.11. RAISE_VARARGS(argc) Raises an exception using one of the 3 forms of the "raise" statement, depending on the value of *argc*: * 0: "raise" (re-raise previous exception) * 1: "raise STACK[-1]" (raise exception instance or type at "STACK[-1]") * 2: "raise STACK[-2] from STACK[-1]" (raise exception instance or type at "STACK[-2]" with "__cause__" set to "STACK[-1]") CALL(argc) Calls a callable object with the number of arguments specified by "argc". On the stack are (in ascending order): * The callable * "self" or "NULL" * The remaining positional arguments "argc" is the total of the positional arguments, excluding "self". "CALL" pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object. Added in version 3.11. Changed in version 3.13: The callable now always appears at the same position on the stack. Changed in version 3.13: Calls with keyword arguments are now handled by "CALL_KW". CALL_KW(argc) Calls a callable object with the number of arguments specified by "argc", including one or more named arguments. On the stack are (in ascending order): * The callable * "self" or "NULL" * The remaining positional arguments * The named arguments * A "tuple" of keyword argument names "argc" is the total of the positional and named arguments, excluding "self". The length of the tuple of keyword argument names is the number of named arguments. "CALL_KW" pops all arguments, the keyword names, and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object. Added in version 3.13. CALL_FUNCTION_EX(flags) Calls a callable object with variable set of positional and keyword arguments. If the lowest bit of *flags* is set, the top of the stack contains a mapping object containing additional keyword arguments. Before the callable is called, the mapping object and iterable object are each “unpacked” and their contents passed in as keyword and positional arguments respectively. "CALL_FUNCTION_EX" pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object. Added in version 3.6. PUSH_NULL Pushes a "NULL" to the stack. Used in the call sequence to match the "NULL" pushed by "LOAD_METHOD" for non-method calls. Added in version 3.11. MAKE_FUNCTION Pushes a new function object on the stack built from the code object at "STACK[-1]". Changed in version 3.10: Flag value "0x04" is a tuple of strings instead of dictionary Changed in version 3.11: Qualified name at "STACK[-1]" was removed. Changed in version 3.13: Extra function attributes on the stack, signaled by oparg flags, were removed. They now use "SET_FUNCTION_ATTRIBUTE". SET_FUNCTION_ATTRIBUTE(flag) Sets an attribute on a function object. Expects the function at "STACK[-1]" and the attribute value to set at "STACK[-2]"; consumes both and leaves the function at "STACK[-1]". The flag determines which attribute to set: * "0x01" a tuple of default values for positional-only and positional-or-keyword parameters in positional order * "0x02" a dictionary of keyword-only parameters’ default values * "0x04" a tuple of strings containing parameters’ annotations * "0x08" a tuple containing cells for free variables, making a closure Added in version 3.13. BUILD_SLICE(argc) Pushes a slice object on the stack. *argc* must be 2 or 3. If it is 2, implements: end = STACK.pop() start = STACK.pop() STACK.append(slice(start, end)) if it is 3, implements: step = STACK.pop() end = STACK.pop() start = STACK.pop() STACK.append(slice(start, end, step)) See the "slice()" built-in function for more information. EXTENDED_ARG(ext) Prefixes any opcode which has an argument too big to fit into the default one byte. *ext* holds an additional byte which act as higher bits in the argument. For each opcode, at most three prefixal "EXTENDED_ARG" are allowed, forming an argument from two- byte to four-byte. CONVERT_VALUE(oparg) Convert value to a string, depending on "oparg": value = STACK.pop() result = func(value) STACK.append(result) * "oparg == 1": call "str()" on *value* * "oparg == 2": call "repr()" on *value* * "oparg == 3": call "ascii()" on *value* Used for implementing formatted string literals (f-strings). Added in version 3.13. FORMAT_SIMPLE Formats the value on top of stack: value = STACK.pop() result = value.__format__("") STACK.append(result) Used for implementing formatted string literals (f-strings). Added in version 3.13. FORMAT_WITH_SPEC Formats the given value with the given format spec: spec = STACK.pop() value = STACK.pop() result = value.__format__(spec) STACK.append(result) Used for implementing formatted string literals (f-strings). Added in version 3.13. MATCH_CLASS(count) "STACK[-1]" is a tuple of keyword attribute names, "STACK[-2]" is the class being matched against, and "STACK[-3]" is the match subject. *count* is the number of positional sub-patterns. Pop "STACK[-1]", "STACK[-2]", and "STACK[-3]". If "STACK[-3]" is an instance of "STACK[-2]" and has the positional and keyword attributes required by *count* and "STACK[-1]", push a tuple of extracted attributes. Otherwise, push "None". Added in version 3.10. Changed in version 3.11: Previously, this instruction also pushed a boolean value indicating success ("True") or failure ("False"). RESUME(context) A no-op. Performs internal tracing, debugging and optimization checks. The "context" oparand consists of two parts. The lowest two bits indicate where the "RESUME" occurs: * "0" The start of a function, which is neither a generator, coroutine nor an async generator * "1" After a "yield" expression * "2" After a "yield from" expression * "3" After an "await" expression The next bit is "1" if the RESUME is at except-depth "1", and "0" otherwise. Added in version 3.11. Changed in version 3.13: The oparg value changed to include information about except-depth RETURN_GENERATOR Create a generator, coroutine, or async generator from the current frame. Used as first opcode of in code object for the above mentioned callables. Clear the current frame and return the newly created generator. Added in version 3.11. SEND(delta) Equivalent to "STACK[-1] = STACK[-2].send(STACK[-1])". Used in "yield from" and "await" statements. If the call raises "StopIteration", pop the top value from the stack, push the exception’s "value" attribute, and increment the bytecode counter by *delta*. Added in version 3.11. HAVE_ARGUMENT This is not really an opcode. It identifies the dividing line between opcodes in the range [0,255] which don’t use their argument and those that do ("< HAVE_ARGUMENT" and ">= HAVE_ARGUMENT", respectively). If your application uses pseudo instructions or specialized instructions, use the "hasarg" collection instead. Changed in version 3.6: Now every instruction has an argument, but opcodes "< HAVE_ARGUMENT" ignore it. Before, only opcodes ">= HAVE_ARGUMENT" had an argument. Changed in version 3.12: Pseudo instructions were added to the "dis" module, and for them it is not true that comparison with "HAVE_ARGUMENT" indicates whether they use their arg. Deprecated since version 3.13: Use "hasarg" instead. CALL_INTRINSIC_1 Calls an intrinsic function with one argument. Passes "STACK[-1]" as the argument and sets "STACK[-1]" to the result. Used to implement functionality that is not performance critical. The operand determines which intrinsic function is called: +-------------------------------------+-------------------------------------+ | Operand | Description | |=====================================|=====================================| | "INTRINSIC_1_INVALID" | Not valid | +-------------------------------------+-------------------------------------+ | "INTRINSIC_PRINT" | Prints the argument to standard | | | out. Used in the REPL. | +-------------------------------------+-------------------------------------+ | "INTRINSIC_IMPORT_STAR" | Performs "import *" for the named | | | module. | +-------------------------------------+-------------------------------------+ | "INTRINSIC_STOPITERATION_ERROR" | Extracts the return value from a | | | "StopIteration" exception. | +-------------------------------------+-------------------------------------+ | "INTRINSIC_ASYNC_GEN_WRAP" | Wraps an async generator value | +-------------------------------------+-------------------------------------+ | "INTRINSIC_UNARY_POSITIVE" | Performs the unary "+" operation | +-------------------------------------+-------------------------------------+ | "INTRINSIC_LIST_TO_TUPLE" | Converts a list to a tuple | +-------------------------------------+-------------------------------------+ | "INTRINSIC_TYPEVAR" | Creates a "typing.TypeVar" | +-------------------------------------+-------------------------------------+ | "INTRINSIC_PARAMSPEC" | Creates a "typing.ParamSpec" | +-------------------------------------+-------------------------------------+ | "INTRINSIC_TYPEVARTUPLE" | Creates a "typing.TypeVarTuple" | +-------------------------------------+-------------------------------------+ | "INTRINSIC_SUBSCRIPT_GENERIC" | Returns "typing.Generic" | | | subscripted with the argument | +-------------------------------------+-------------------------------------+ | "INTRINSIC_TYPEALIAS" | Creates a "typing.TypeAliasType"; | | | used in the "type" statement. The | | | argument is a tuple of the type | | | alias’s name, type parameters, and | | | value. | +-------------------------------------+-------------------------------------+ Added in version 3.12. CALL_INTRINSIC_2 Calls an intrinsic function with two arguments. Used to implement functionality that is not performance critical: arg2 = STACK.pop() arg1 = STACK.pop() result = intrinsic2(arg1, arg2) STACK.append(result) The operand determines which intrinsic function is called: +------------------------------------------+-------------------------------------+ | Operand | Description | |==========================================|=====================================| | "INTRINSIC_2_INVALID" | Not valid | +------------------------------------------+-------------------------------------+ | "INTRINSIC_PREP_RERAISE_STAR" | Calculates the "ExceptionGroup" to | | | raise from a "try-except*". | +------------------------------------------+-------------------------------------+ | "INTRINSIC_TYPEVAR_WITH_BOUND" | Creates a "typing.TypeVar" with a | | | bound. | +------------------------------------------+-------------------------------------+ | "INTRINSIC_TYPEVAR_WITH_CONSTRAINTS" | Creates a "typing.TypeVar" with | | | constraints. | +------------------------------------------+-------------------------------------+ | "INTRINSIC_SET_FUNCTION_TYPE_PARAMS" | Sets the "__type_params__" | | | attribute of a function. | +------------------------------------------+-------------------------------------+ Added in version 3.12. **Pseudo-instructions** These opcodes do not appear in Python bytecode. They are used by the compiler but are replaced by real opcodes or removed before bytecode is generated. SETUP_FINALLY(target) Set up an exception handler for the following code block. If an exception occurs, the value stack level is restored to its current state and control is transferred to the exception handler at "target". SETUP_CLEANUP(target) Like "SETUP_FINALLY", but in case of an exception also pushes the last instruction ("lasti") to the stack so that "RERAISE" can restore it. If an exception occurs, the value stack level and the last instruction on the frame are restored to their current state, and control is transferred to the exception handler at "target". SETUP_WITH(target) Like "SETUP_CLEANUP", but in case of an exception one more item is popped from the stack before control is transferred to the exception handler at "target". This variant is used in "with" and "async with" constructs, which push the return value of the context manager’s "__enter__()" or "__aenter__()" to the stack. POP_BLOCK Marks the end of the code block associated with the last "SETUP_FINALLY", "SETUP_CLEANUP" or "SETUP_WITH". JUMP JUMP_NO_INTERRUPT Undirected relative jump instructions which are replaced by their directed (forward/backward) counterparts by the assembler. LOAD_CLOSURE(i) Pushes a reference to the cell contained in slot "i" of the “fast locals” storage. Note that "LOAD_CLOSURE" is replaced with "LOAD_FAST" in the assembler. Changed in version 3.13: This opcode is now a pseudo-instruction. LOAD_METHOD Optimized unbound method lookup. Emitted as a "LOAD_ATTR" opcode with a flag set in the arg. Opcode collections ================== These collections are provided for automatic introspection of bytecode instructions: Changed in version 3.12: The collections now contain pseudo instructions and instrumented instructions as well. These are opcodes with values ">= MIN_PSEUDO_OPCODE" and ">= MIN_INSTRUMENTED_OPCODE". dis.opname Sequence of operation names, indexable using the bytecode. dis.opmap Dictionary mapping operation names to bytecodes. dis.cmp_op Sequence of all compare operation names. dis.hasarg Sequence of bytecodes that use their argument. Added in version 3.12. dis.hasconst Sequence of bytecodes that access a constant. dis.hasfree Sequence of bytecodes that access a *free (closure) variable*. ‘free’ in this context refers to names in the current scope that are referenced by inner scopes or names in outer scopes that are referenced from this scope. It does *not* include references to global or builtin scopes. dis.hasname Sequence of bytecodes that access an attribute by name. dis.hasjump Sequence of bytecodes that have a jump target. All jumps are relative. Added in version 3.13. dis.haslocal Sequence of bytecodes that access a local variable. dis.hascompare Sequence of bytecodes of Boolean operations. dis.hasexc Sequence of bytecodes that set an exception handler. Added in version 3.12. dis.hasjrel Sequence of bytecodes that have a relative jump target. Deprecated since version 3.13: All jumps are now relative. Use "hasjump". dis.hasjabs Sequence of bytecodes that have an absolute jump target. Deprecated since version 3.13: All jumps are now relative. This list is empty. Software Packaging and Distribution *********************************** These libraries help you with publishing and installing Python software. While these modules are designed to work in conjunction with the Python Package Index, they can also be used with a local index server, or without any index server at all. * "ensurepip" — Bootstrapping the "pip" installer * Command line interface * Module API * "venv" — Creation of virtual environments * Creating virtual environments * How venvs work * API * An example of extending "EnvBuilder" * "zipapp" — Manage executable Python zip archives * Basic Example * Command-Line Interface * Python API * Examples * Specifying the Interpreter * Creating Standalone Applications with zipapp * Caveats * The Python Zip Application Archive Format "distutils" — Building and installing Python modules **************************************************** Deprecated since version 3.10, removed in version 3.12. This module is no longer part of the Python standard library. It was removed in Python 3.12 after being deprecated in Python 3.10. The removal was decided in **PEP 632**, which has migration advice. The last version of Python that provided the "distutils" module was Python 3.11. "doctest" — Test interactive Python examples ******************************************** **Source code:** Lib/doctest.py ====================================================================== The "doctest" module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown. There are several common ways to use doctest: * To check that a module’s docstrings are up-to-date by verifying that all interactive examples still work as documented. * To perform regression testing by verifying that interactive examples from a test file or a test object work as expected. * To write tutorial documentation for a package, liberally illustrated with input-output examples. Depending on whether the examples or the expository text are emphasized, this has the flavor of “literate testing” or “executable documentation”. Here’s a complete but small example module: """ This is the "example" module. The example module supplies one function, factorial(). For example, >>> factorial(5) 120 """ def factorial(n): """Return the factorial of n, an exact integer >= 0. >>> [factorial(n) for n in range(6)] [1, 1, 2, 6, 24, 120] >>> factorial(30) 265252859812191058636308480000000 >>> factorial(-1) Traceback (most recent call last): ... ValueError: n must be >= 0 Factorials of floats are OK, but the float must be an exact integer: >>> factorial(30.1) Traceback (most recent call last): ... ValueError: n must be exact integer >>> factorial(30.0) 265252859812191058636308480000000 It must also not be ridiculously large: >>> factorial(1e100) Traceback (most recent call last): ... OverflowError: n too large """ import math if not n >= 0: raise ValueError("n must be >= 0") if math.floor(n) != n: raise ValueError("n must be exact integer") if n+1 == n: # catch a value like 1e300 raise OverflowError("n too large") result = 1 factor = 2 while factor <= n: result *= factor factor += 1 return result if __name__ == "__main__": import doctest doctest.testmod() If you run "example.py" directly from the command line, "doctest" works its magic: $ python example.py $ There’s no output! That’s normal, and it means all the examples worked. Pass "-v" to the script, and "doctest" prints a detailed log of what it’s trying, and prints a summary at the end: $ python example.py -v Trying: factorial(5) Expecting: 120 ok Trying: [factorial(n) for n in range(6)] Expecting: [1, 1, 2, 6, 24, 120] ok And so on, eventually ending with: Trying: factorial(1e100) Expecting: Traceback (most recent call last): ... OverflowError: n too large ok 2 items passed all tests: 1 test in __main__ 6 tests in __main__.factorial 7 tests in 2 items. 7 passed. Test passed. $ That’s all you need to know to start making productive use of "doctest"! Jump in. The following sections provide full details. Note that there are many examples of doctests in the standard Python test suite and libraries. Especially useful examples can be found in the standard test file "Lib/test/test_doctest/test_doctest.py". Simple Usage: Checking Examples in Docstrings ============================================= The simplest way to start using doctest (but not necessarily the way you’ll continue to do it) is to end each module "M" with: if __name__ == "__main__": import doctest doctest.testmod() "doctest" then examines docstrings in module "M". Running the module as a script causes the examples in the docstrings to get executed and verified: python M.py This won’t display anything unless an example fails, in which case the failing example(s) and the cause(s) of the failure(s) are printed to stdout, and the final line of output is "***Test Failed*** N failures.", where *N* is the number of examples that failed. Run it with the "-v" switch instead: python M.py -v and a detailed report of all examples tried is printed to standard output, along with assorted summaries at the end. You can force verbose mode by passing "verbose=True" to "testmod()", or prohibit it by passing "verbose=False". In either of those cases, "sys.argv" is not examined by "testmod()" (so passing "-v" or not has no effect). There is also a command line shortcut for running "testmod()", see section Command-line Usage. For more information on "testmod()", see section Basic API. Simple Usage: Checking Examples in a Text File ============================================== Another simple application of doctest is testing interactive examples in a text file. This can be done with the "testfile()" function: import doctest doctest.testfile("example.txt") That short script executes and verifies any interactive Python examples contained in the file "example.txt". The file content is treated as if it were a single giant docstring; the file doesn’t need to contain a Python program! For example, perhaps "example.txt" contains this: The ``example`` module ====================== Using ``factorial`` ------------------- This is an example text file in reStructuredText format. First import ``factorial`` from the ``example`` module: >>> from example import factorial Now use it: >>> factorial(6) 120 Running "doctest.testfile("example.txt")" then finds the error in this documentation: File "./example.txt", line 14, in example.txt Failed example: factorial(6) Expected: 120 Got: 720 As with "testmod()", "testfile()" won’t display anything unless an example fails. If an example does fail, then the failing example(s) and the cause(s) of the failure(s) are printed to stdout, using the same format as "testmod()". By default, "testfile()" looks for files in the calling module’s directory. See section Basic API for a description of the optional arguments that can be used to tell it to look for files in other locations. Like "testmod()", "testfile()"’s verbosity can be set with the "-v" command-line switch or with the optional keyword argument *verbose*. There is also a command line shortcut for running "testfile()", see section Command-line Usage. For more information on "testfile()", see section Basic API. Command-line Usage ================== The "doctest" module can be invoked as a script from the command line: python -m doctest [-v] [-o OPTION] [-f] file [file ...] -v, --verbose Detailed report of all examples tried is printed to standard output, along with assorted summaries at the end: python -m doctest -v example.py This will import "example.py" as a standalone module and run "testmod()" on it. Note that this may not work correctly if the file is part of a package and imports other submodules from that package. If the file name does not end with ".py", "doctest" infers that it must be run with "testfile()" instead: python -m doctest -v example.txt -o, --option