Skip to content

gh-135676: Lexical analysis: Reword String literals and related sections #135942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 60 additions & 3 deletions Doc/reference/expressions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -133,13 +133,18 @@ Literals

Python supports string and bytes literals and various numeric literals:

.. productionlist:: python-grammar
literal: `stringliteral` | `bytesliteral` | `NUMBER`
.. grammar-snippet::
:group: python-grammar

literal: `strings` | `NUMBER`

Evaluation of a literal yields an object of the given type (string, bytes,
integer, floating-point number, complex number) with the given value. The value
may be approximated in the case of floating-point and imaginary (complex)
literals. See section :ref:`literals` for details.
literals.
See section :ref:`literals` for details.
See section :ref:`string-concatenation` for details on ``strings``.


.. index::
triple: immutable; data; type
Expand All @@ -152,6 +157,58 @@ occurrence) may obtain the same object or a different object with the same
value.


.. _string-concatenation:

String literal concatenation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Multiple adjacent string or bytes literals (delimited by whitespace), possibly
using different quoting conventions, are allowed, and their meaning is the same
as their concatenation::

>>> "hello" 'world'
"helloworld"

Formally:

.. grammar-snippet::
:group: python-grammar

strings: ( `STRING` | fstring)+ | tstring+

Note that this feature is defined at the syntactical level, so it only works
with literals.
To concatenate string expressions at run time, the '+' operator may be used::

>>> greeting = "Hello"
>>> space = " "
>>> name = "Blaise"
>>> print(greeting + space + name) # not: print(greeting space name)
Hello Blaise

Also note that literal concatenation can freely mix raw strings,
triple-quoted strings, and formatted string literals. For example::

>>> "Hello" r', ' f"{name}!"
"Hello, Blaise!"

However, bytes literals may only be combined with other byte literals;
not with string literals of any kind.
Also, template string literals may only be combined with other template
string literals::

>>> t"Hello" t"{name}!"
Template(strings=('Hello', '!'), interpolations=(...))

This feature can be used to reduce the number of backslashes
needed, to split long strings conveniently across long lines, or even to add
comments to parts of strings. For example::

re.compile("[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)


.. _parenthesized:

Parenthesized forms
Expand Down
5 changes: 1 addition & 4 deletions Doc/reference/grammar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,8 @@ error recovery.

The notation used here is the same as in the preceding docs,
and is described in the :ref:`notation <notation>` section,
except for a few extra complications:
except for an extra complication:

* ``&e``: a positive lookahead (that is, ``e`` is required to match but
not consumed)
* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
* ``~`` ("cut"): commit to the current alternative and fail the rule
even if this fails to parse

Expand Down
16 changes: 12 additions & 4 deletions Doc/reference/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,15 +145,23 @@ The definition to the right of the colon uses the following syntax elements:
* ``e?``: A question mark has exactly the same meaning as square brackets:
the preceding item is optional.
* ``(e)``: Parentheses are used for grouping.

The following notation is only used in
:ref:`lexical definitions <notation-lexical-vs-syntactic>`.

* ``"a"..."z"``: Two literal characters separated by three dots mean a choice
of any single character in the given (inclusive) range of ASCII characters.
This notation is only used in
:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
* ``<...>``: A phrase between angular brackets gives an informal description
of the matched symbol (for example, ``<any ASCII character except "\">``),
or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
This notation is only used in
:ref:`lexical definitions <notation-lexical-vs-syntactic>`.

.. _lexical-lookaheads:

Some definitions also use *lookaheads*, which indicate that an element
must (or must not) match at a given position, but without consuming any input:

* ``&e``: a positive lookahead (that is, ``e`` is required to match)
* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)

The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
the vertical bar (``|``) binds most loosely.
Expand Down
Loading
Loading