cherrypy._cpreqbody

Request body processing for CherryPy.

New in version 3.2.

Application authors have complete control over the parsing of HTTP request entities. In short, cherrypy.request.body is now always set to an instance of RequestBody, and that class is a subclass of Entity.

When an HTTP request includes an entity body, it is often desirable to provide that information to applications in a form other than the raw bytes. Different content types demand different approaches. Examples:

  • For a GIF file, we want the raw bytes in a stream.
  • An HTML form is better parsed into its component fields, and each text field decoded from bytes to unicode.
  • A JSON body should be deserialized into a Python dict or list.

When the request contains a Content-Type header, the media type is used as a key to look up a value in the request.body.processors dict. If the full media type is not found, then the major type is tried; for example, if no processor is found for the ‘image/jpeg’ type, then we look for a processor for the ‘image’ types altogether. If neither the full type nor the major type has a matching processor, then a default processor is used (default_proc). For most types, this means no processing is done, and the body is left unread as a raw byte stream. Processors are configurable in an ‘on_start_resource’ hook.

Some processors, especially those for the ‘text’ types, attempt to decode bytes to unicode. If the Content-Type request header includes a ‘charset’ parameter, this is used to decode the entity. Otherwise, one or more default charsets may be attempted, although this decision is up to each processor. If a processor successfully decodes an Entity or Part, it should set the charset attribute on the Entity or Part to the name of the successful charset, so that applications can easily re-encode or transcode the value if they wish.

If the Content-Type of the request entity is of major type ‘multipart’, then the above parsing process, and possibly a decoding process, is performed for each part.

For both the full entity and multipart parts, a Content-Disposition header may be used to fill name and filename attributes on the request.body or the Part.

Custom Processors

You can add your own processors for any specific or major MIME type. Simply add it to the processors dict in a hook/tool that runs at on_start_resource or before_request_body. Here’s the built-in JSON tool for an example:

def json_in(force=True, debug=False):
    request = cherrypy.serving.request
    def json_processor(entity):
        """Read application/json data into request.json."""
        if not entity.headers.get("Content-Length", ""):
            raise cherrypy.HTTPError(411)

        body = entity.fp.read()
        try:
            request.json = json_decode(body)
        except ValueError:
            raise cherrypy.HTTPError(400, 'Invalid JSON document')
    if force:
        request.body.processors.clear()
        request.body.default_proc = cherrypy.HTTPError(
            415, 'Expected an application/json content type')
    request.body.processors['application/json'] = json_processor

We begin by defining a new json_processor function to stick in the processors dictionary. All processor functions take a single argument, the Entity instance they are to process. It will be called whenever a request is received (for those URI’s where the tool is turned on) which has a Content-Type of “application/json”.

First, it checks for a valid Content-Length (raising 411 if not valid), then reads the remaining bytes on the socket. The fp object knows its own length, so it won’t hang waiting for data that never arrives. It will return when all data has been read. Then, we decode those bytes using Python’s built-in json module, and stick the decoded result onto request.json . If it cannot be decoded, we raise 400.

If the “force” argument is True (the default), the Tool clears the processors dict so that request entities of other Content-Types aren’t parsed at all. Since there’s no entry for those invalid MIME types, the default_proc method of cherrypy.request.body is called. But this does nothing by default (usually to provide the page handler an opportunity to handle it.) But in our case, we want to raise 415, so we replace request.body.default_proc with the error (HTTPError instances, when called, raise themselves).

If we were defining a custom processor, we can do so without making a Tool. Just add the config entry:

request.body.processors = {'application/json': json_processor}

Note that you can only replace the processors dict wholesale this way, not update the existing one.

Classes

class cherrypy._cpreqbody.Entity(fp, headers, params=None, parts=None)

An HTTP request body, or MIME multipart body.

This class collects information about the HTTP request entity. When a given entity is of MIME type “multipart”, each part is parsed into its own Entity instance, and the set of parts stored in entity.parts.

Between the before_request_body and before_handler tools, CherryPy tries to process the request body (if any) by calling request.body.process. This uses the content_type of the Entity to look up a suitable processor in Entity.processors, a dict. If a matching processor cannot be found for the complete Content-Type, it tries again using the major type. For example, if a request with an entity of type “image/jpeg” arrives, but no processor can be found for that complete type, then one is sought for the major type “image”. If a processor is still not found, then the default_proc method of the Entity is called (which does nothing by default; you can override this too).

CherryPy includes processors for the “application/x-www-form-urlencoded” type, the “multipart/form-data” type, and the “multipart” major type. CherryPy 3.2 processes these types almost exactly as older versions. Parts are passed as arguments to the page handler using their Content-Disposition.name if given, otherwise in a generic “parts” argument. Each such part is either a string, or the Part itself if it’s a file. (In this case it will have file and filename attributes, or possibly a value attribute). Each Part is itself a subclass of Entity, and has its own process method and processors dict.

There is a separate processor for the “multipart” major type which is more flexible, and simply stores all multipart parts in request.body.parts. You can enable it with:

cherrypy.request.body.processors['multipart'] = _cpreqbody.process_multipart

in an on_start_resource tool.

attempt_charsets = ['utf-8']

A list of strings, each of which should be a known encoding.

When the Content-Type of the request body warrants it, each of the given encodings will be tried in order. The first one to successfully decode the entity without raising an error is stored as entity.charset. This defaults to ['utf-8'] (plus ‘ISO-8859-1’ for “text/*” types, as required by HTTP/1.1), but ['us-ascii', 'utf-8'] for multipart parts.

charset = None

The successful decoding; see “attempt_charsets” above.

content_type = None

The value of the Content-Type request header.

If the Entity is part of a multipart payload, this will be the Content-Type given in the MIME headers for this part.

default_content_type = 'application/x-www-form-urlencoded'

This defines a default Content-Type to use if no Content-Type header is given. The empty string is used for RequestBody, which results in the request body not being read or parsed at all. This is by design; a missing Content-Type header in the HTTP request entity is an error at best, and a security hole at worst. For multipart parts, however, the MIME spec declares that a part with no Content-Type defaults to “text/plain” (see Part).

default_proc()

Called if a more-specific processor is not found for the Content-Type.

filename = None

The Content-Disposition.filename header, if available.

fp = None

The readable socket file object.

fullvalue()

Return this entity as a string, whether stored in a file or not.

headers = None

A dict of request/multipart header names and values.

This is a copy of the request.headers for the request.body; for multipart parts, it is the set of headers for that part.

length = None

The value of the Content-Length header, if provided.

make_file()

Return a file-like object into which the request body will be read.

By default, this will return a TemporaryFile. Override as needed. See also cherrypy._cpreqbody.Part.maxrambytes.

name = None

The “name” parameter of the Content-Disposition header, if any.

params = None

If the request Content-Type is ‘application/x-www-form-urlencoded’ or multipart, this will be a dict of the params pulled from the entity body; that is, it will be the portion of request.params that come from the message body (sometimes called “POST params”, although they can be sent with various HTTP method verbs). This value is set between the ‘before_request_body’ and ‘before_handler’ hooks (assuming that process_request_body is True).

part_class

The class used for multipart parts.

You can replace this with custom subclasses to alter the processing of multipart parts.

alias of Part

parts = None

A list of Part instances if Content-Type is of major type “multipart”.

process()

Execute the best-match processor for the given media type.

processors = {'multipart': <function process_multipart at 0x7f2b825a32a8>, 'multipart/form-data': <function process_multipart_form_data at 0x7f2b825a3320>, 'application/x-www-form-urlencoded': <function process_urlencoded at 0x7f2b82502668>}

A dict of Content-Type names to processor methods.

read_into_file(fp_out=None)

Read the request body into fp_out (or make_file() if None).

Return fp_out.

type

A deprecated alias for content_type.

class cherrypy._cpreqbody.Part(fp, headers, boundary)

Bases: cherrypy._cpreqbody.Entity

A MIME part entity, part of a multipart entity.

attempt_charsets = ['us-ascii', 'utf-8']

A list of strings, each of which should be a known encoding.

When the Content-Type of the request body warrants it, each of the given encodings will be tried in order. The first one to successfully decode the entity without raising an error is stored as entity.charset. This defaults to ['utf-8'] (plus ‘ISO-8859-1’ for “text/*” types, as required by HTTP/1.1), but ['us-ascii', 'utf-8'] for multipart parts.

boundary = None

The MIME multipart boundary.

default_content_type = 'text/plain'

This defines a default Content-Type to use if no Content-Type header is given. The empty string is used for RequestBody, which results in the request body not being read or parsed at all. This is by design; a missing Content-Type header in the HTTP request entity is an error at best, and a security hole at worst. For multipart parts, however (this class), the MIME spec declares that a part with no Content-Type defaults to “text/plain”.

default_proc()

Called if a more-specific processor is not found for the Content-Type.

fullvalue()

Return this entity as a string, whether stored in a file or not.

make_file()

Return a file-like object into which the request body will be read.

By default, this will return a TemporaryFile. Override as needed. See also cherrypy._cpreqbody.Part.maxrambytes.

maxrambytes = 1000

The threshold of bytes after which point the Part will store its data in a file (generated by make_file) instead of a string. Defaults to 1000, just like the cgi module in Python’s standard library.

part_class

alias of Part

process()

Execute the best-match processor for the given media type.

read_into_file(fp_out=None)

Read the request body into fp_out (or make_file() if None).

Return fp_out.

read_lines_to_boundary(fp_out=None)

Read bytes from self.fp and return or write them to a file.

If the ‘fp_out’ argument is None (the default), all bytes read are returned in a single byte string.

If the ‘fp_out’ argument is not None, it must be a file-like object that supports the ‘write’ method; all bytes read will be written to the fp, and that fp is returned.

type

A deprecated alias for content_type.

class cherrypy._cpreqbody.RequestBody(fp, headers, params=None, request_params=None)

Bases: cherrypy._cpreqbody.Entity

The entity of the HTTP request.

bufsize = 8192

The buffer size used when reading the socket.

default_content_type = ''

This defines a default Content-Type to use if no Content-Type header is given. The empty string is used for RequestBody, which results in the request body not being read or parsed at all. This is by design; a missing Content-Type header in the HTTP request entity is an error at best, and a security hole at worst. For multipart parts, however, the MIME spec declares that a part with no Content-Type defaults to “text/plain” (see Part).

default_proc()

Called if a more-specific processor is not found for the Content-Type.

fullvalue()

Return this entity as a string, whether stored in a file or not.

make_file()

Return a file-like object into which the request body will be read.

By default, this will return a TemporaryFile. Override as needed. See also cherrypy._cpreqbody.Part.maxrambytes.

maxbytes = None

Raise MaxSizeExceeded if more bytes than this are read from the socket.

part_class

alias of Part

process()

Process the request entity based on its Content-Type.

read_into_file(fp_out=None)

Read the request body into fp_out (or make_file() if None).

Return fp_out.

type

A deprecated alias for content_type.

class cherrypy._cpreqbody.SizedReader(fp, length, maxbytes, bufsize=8192, has_trailers=False)
read(size=None, fp_out=None)

Read bytes from the request body and return or write them to a file.

A number of bytes less than or equal to the ‘size’ argument are read off the socket. The actual number of bytes read are tracked in self.bytes_read. The number may be smaller than ‘size’ when 1) the client sends fewer bytes, 2) the ‘Content-Length’ request header specifies fewer bytes than requested, or 3) the number of bytes read exceeds self.maxbytes (in which case, 413 is raised).

If the ‘fp_out’ argument is None (the default), all bytes read are returned in a single byte string.

If the ‘fp_out’ argument is not None, it must be a file-like object that supports the ‘write’ method; all bytes read will be written to the fp, and None is returned.

readline(size=None)

Read a line from the request body and return it.

readlines(sizehint=None)

Read lines from the request body and return them.

Functions

cherrypy._cpreqbody.process_urlencoded(entity)

Read application/x-www-form-urlencoded data into entity.params.

cherrypy._cpreqbody.process_multipart(entity)

Read all multipart parts into entity.parts.

cherrypy._cpreqbody.process_multipart_form_data(entity)

Read all multipart/form-data parts into entity.parts or entity.params.