Skip to content

change result #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: result-as-tuple
Choose a base branch
from
Open

Conversation

bymoye
Copy link
Contributor

@bymoye bymoye commented Jul 1, 2025

No description provided.

@bymoye
Copy link
Contributor Author

bymoye commented Jul 1, 2025

I also modified one thing in the code (I just did some simple research):

.call((raw_bytes_data.to_vec(),), None)?
to:

.call1((PyBytes::new(py, raw_bytes_data),))?
Reason:
This reduces data copying from 2 operations (Rust Vec creation + Python internal copy) to 1 operation (direct copy to Python memory).

@bymoye
Copy link
Contributor Author

bymoye commented Jul 1, 2025

@chandr-andr ,In 8cc74de, I made further optimizations to improve performance. I only conducted simple tests with pytest, no benchmarking. Can you run some tests? If there's a performance regression, feel free to roll back the code.
It is best to increase the memory usage test!

@chandr-andr
Copy link
Member

@bymoye Thank you very much for refactoring, it looks awesome.
I cannot run benchmarks right now cuz I have to work (unfortunately).

Will run them in the evening

@chandr-andr
Copy link
Member

@bymoye I run benchmarks, I have, I found a difference, not big, but you didn't do anything worse.

@chandr-andr
Copy link
Member

The rest of the changes look awesome, thank you a lot!

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

ok!Thank you for testing

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

@chandr-andr I checked your code, and I think the current test is not right. Because the engine benchmark should test the performance of the engine, not the performance of the database. So the database query should not use ORDER BY RANDOM, but to reduce the database pressure as much as possible (so that the database is almost no pressure) to complete the query, and then return to Python to get the same result.
The data will be converted and other actions (completed by the engine).
We can return as much data as possible to complete the performance test.

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

For example, a single query uses SELECT 1;
Multiple queries use id< 101; id < 1001; id < 10001;
and returns the actual data instead of the current "ok";

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

Time Complexity: The original code has a time complexity of $O(N \cdot D)$, while the optimized code reduces it to $O(N)$, significantly lowering complexity and preventing performance degradation.
Space Complexity: Both versions have a space complexity of $O(N)$.

Optimized: Memory is allocated once and used throughout, ensuring stability and efficiency.
Original: Potential memory jitter (repeated allocation and deallocation) introduces additional overhead.

Memory Copy:

Original: Memory copy operations occur $N \cdot (D - 1)$ times (handled internally by Rust).
Optimized: Reduced to $0$ copies, as all operations are performed via references.

Total Copied Elements:

Original: $N \cdot (D + 1)$ elements copied.
Optimized: Reduced to $2N$ elements.

These optimizations should bring significant performance improvements to querying data values ​​(especially for big data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants