Skip to content

Use environment variables to use std::regex #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

hkxs
Copy link

@hkxs hkxs commented Dec 30, 2024

RapidJson uses an in-house regex engine that has limited syntax, we can use std::regex by defining

  • RAPIDJSON_SCHEMA_USE_INTERNALREGEX=0
  • RAPIDJSON_SCHEMA_USE_STDREGEX=1

This patch extends that functionality to python-rapidjson by using the environmental variable RAPIDJSON_SCHEMA_USE_STDREGEX

For example, the following schema uses negative-lookahead (not supported by default engine):

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "some_id",
    "type": "object",
    "additionalProperties": False,
    "required": ["metadata"],
    "patternProperties": {
        "^(?!metadata$).+": {
            "type": "string"
        }
    },
    "properties": {
        "metadata": {
            "type": "integer"
        }
    }
}

Using this schema leads to a ValidationError using this json {"a": "a", "metadata": 1}:

>>> import rapidjson   
>>> schema_dict = {
       "$schema": "https://json-schema.org/draft/2020-12/schema",
       "$id": "some_id",
       "type": "object",
       "additionalProperties": False,
       "required": ["metadata"],
       "patternProperties": {
           "^(?!metadata$).+": {
               "type": "string"
           }
       },
       "properties": {
           "metadata": {
               "type": "integer"
           }
       }
   }   
>>> validator = rapidjson.Validator(rapidjson.dumps(schema_dict))
>>> validator(rapidjson.dumps({"a": "a", "metadata": 1}))
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    validator(rapidjson.dumps({"a": "a", "metadata": 1}))
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rapidjson.ValidationError: ('additionalProperties', '#', '#/a')

Expected behavior:

>>> import rapidjson   
>>> schema_dict = {
       "$schema": "https://json-schema.org/draft/2020-12/schema",
       "$id": "some_id",
       "type": "object",
       "additionalProperties": False,
       "required": ["metadata"],
       "patternProperties": {
           "^(?!metadata$).+": {
               "type": "string"
           }
       },
       "properties": {
           "metadata": {
               "type": "integer"
           }
       }
   }   
>>> validator = rapidjson.Validator(rapidjson.dumps(schema_dict))
>>> validator(rapidjson.dumps({"a": "a", "metadata": 1}))  # no error raised by the validator

@lelit
Copy link
Contributor

lelit commented Dec 31, 2024

Thanks for the improvement.

Ideally we should have a test case for the functionality, even just what you wrote above, but at a minimum I'd say we should expose the availability of StdRegex on the module, say RAPIDJSON_SCHEMA_USES_STDREGEX or something like that, similar to the RAPIDJSON_VERSION attribute.

What do you think?

@hkxs
Copy link
Author

hkxs commented Jan 2, 2025

Thanks for the improvement.

Ideally we should have a test case for the functionality, even just what you wrote above, but at a minimum I'd say we should expose the availability of StdRegex on the module, say RAPIDJSON_SCHEMA_USES_STDREGEX or something like that, similar to the RAPIDJSON_VERSION attribute.

What do you think?

You're right it is good to have it exposed on the module, I added it (but I'm not sure if I should add something on typings/rapidjson/init.pyi, let me know if I'm missing something there). I also added an small test case to validate that we can use other regex when the variable is set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants