Pulse · tensorflow/tensorflow · GitHub

June 24, 2025 – July 1, 2025

Overview

334 Active pull requests

17 Active issues

241 Pull requests merged by 6 people

[oneDNN][CPU] fuse a matmul pattern
#86172 merged Jul 1, 2025
[XLA:GPU] check if producer / consumer can be fused in CanFuseTriton
#96184 merged Jul 1, 2025
[XLA:CPU] Set loaded workgroup as invariant & fix incorrect alignment metadata.
#96067 merged Jul 1, 2025
[XLA:GPU] Fix the implementation of the TMA viability filter to conform to the Nvidia documentation
#96185 merged Jul 1, 2025
Add use_shardy_partitioner in TPUCompileMetadataProto and TpuAotCompilationOptions. The default value is false.
#95808 merged Jul 1, 2025
[XLA] Add after target rule kDelay rule and use it in aggressive flexible scheduling
#96173 merged Jul 1, 2025
unicode_ops: Encode noncharacters
#96193 merged Jul 1, 2025
Fix compilation error in tensorflow/python/tfcompile_wrapper.cc on s390x
#95322 merged Jul 1, 2025
Automated Code Change
#96156 merged Jul 1, 2025
Automated Code Change
#96154 merged Jul 1, 2025
PR #28394: [XLA:GPU] Add DynamicSliceCopyFusion command to command buffer.
#96196 merged Jul 1, 2025
Add CreateRawAliasOfBuffer() to CommonPjRtBufferImpl.
#95618 merged Jul 1, 2025
Simplify ComputationPlacer interface.
#96102 merged Jul 1, 2025
Add initial support for reductions in XnnGraphFusion.
#96140 merged Jul 1, 2025
Fix invalid process count bug?
#96091 merged Jul 1, 2025
Fix: Shardy should propagate use_auto_spmd_partitioning field from config
#96206 merged Jul 1, 2025
Update version of rules_ml_toolchain repo.
#96202 merged Jul 1, 2025
1. Limit the max-width and max-height of a cell.
#96189 merged Jun 30, 2025
Integrate StableHLO at openxla/stablehlo@9df3b556
#96200 merged Jun 30, 2025
Refactor py_import macros to avoid unpacking pypi wheels twice.
#95889 merged Jun 30, 2025
Announcement about deprecating tf lite within TF.
#96204 merged Jun 30, 2025
add no_oss_py313 tag to following max python test
#96201 merged Jun 30, 2025
Check operands of elementwise ops offloaded to xnnpack.
#96161 merged Jun 30, 2025
Added a XLA Flags Guidance doc
#95699 merged Jun 30, 2025
Remove device shape & device size bytes functions from HLO runners.
#96045 merged Jun 30, 2025
[xla:pjrt] Add PjRtFuture::TryMap to map futures when map functor itself can fail
#96197 merged Jun 30, 2025
[xla:cpu] Decouple xfeed manager from legacy cpu_runtime
#96096 merged Jun 30, 2025
Rollback due to suspected correctness issues.
#96192 merged Jun 30, 2025
[XLA] Add a length limit on op_name when concatenating in call_inliner.
#96191 merged Jun 30, 2025
Remove the auto assignment list
#96174 merged Jun 30, 2025
Don't handle empty reductions in XNNPACK delegate
#96105 merged Jun 30, 2025
[XLA:GPU] Record reification cost only for supported instructions.
#95960 merged Jun 30, 2025
[XLA:CPU] Add dereferenceable metadata to loaded argument pointer.
#96070 merged Jun 30, 2025
[XLA:GPU] Code cleanups:
#96182 merged Jun 30, 2025
Fix a problem in Shape::Equal in comparing buffer types.
#95910 merged Jun 30, 2025
[XLA:GPU] Make gpu_hlo_schedule testing cover unified latency estimator cases.
#95588 merged Jun 30, 2025
[XLA:CPU][XLA:GPU] Use llvm array types for GEP when lowering tensors.
#96015 merged Jun 30, 2025
Added support for the following to allow PtrVec to replace
#96109 merged Jun 30, 2025
[XLA:GPU] add lock guards for containers in priority_fusion
#96181 merged Jun 30, 2025
Make XNNPack weight cache work with files bigger than 2 GiB on Windows.
#96090 merged Jun 30, 2025
PR #26217: [XLA:GPU] Add Intel GPU specific tags for xla and sysl_status component
#96179 merged Jun 30, 2025
Automated Code Change
#96168 merged Jun 30, 2025
[XLA:GPU] Enforce compile-time exhaustiveness of GemmFusionAutotuner::BackendConfig visitors.
#96178 merged Jun 30, 2025
Automated Code Change
#96155 merged Jun 30, 2025
Automated Code Change
#96167 merged Jun 30, 2025
Automated Code Change
#96151 merged Jun 30, 2025
[HLO Graph Dumper] Add the computation name to the op name when annotating parameters.
#95978 merged Jun 30, 2025
PR #25298: [XLA:GPU] cudnn sdpa flex attention
#96176 merged Jun 30, 2025
[XLA:GPU] Lifting int4_pass above TritonXlaExtractInsertPass. This is necessary to allow both the legacy Triton emitter and the Generic emitter to correctly handle int4. This also includes accommodating new instructions in the pass.
#95944 merged Jun 29, 2025
Automated Code Change
#96145 merged Jun 29, 2025
Do not propagate name_stacks into lower_jaxpr_to_fun.
#96089 merged Jun 28, 2025
Populate the HLO op_type field from JAX StableHLO.
#96085 merged Jun 28, 2025
Automated Code Change
#96136 merged Jun 28, 2025
Refactor checks in xnn_fusion.cc/xnn_emitter.cc.
#96126 merged Jun 28, 2025
Automated Code Change
#96132 merged Jun 28, 2025
Automated Code Change
#96012 merged Jun 28, 2025
Porting the grappler batch prioritization rewriter to TFRT.
#96022 merged Jun 28, 2025
Automated Code Change
#96121 merged Jun 28, 2025
Add test and refactor Device Assignment.
#94904 merged Jun 28, 2025
Nit, ifrt proxy: additional logging to debug connection failures.
#96120 merged Jun 28, 2025
[xla:pjrt] Add PjRtFuture::Map API to map PjRt futures
#96115 merged Jun 28, 2025
Moving two functions from dot_handler.cc to spmd_partitioner_util.cc to be used by other files.
#95725 merged Jun 28, 2025
[XLA:benchmarks] Add README guide for onboarding new benchmarks to OpenXLA.
#95275 merged Jun 28, 2025
Remove JAX_IFRT_VERSION_NUMBER check for types in XLA/JAX
#96035 merged Jun 28, 2025
Fix typo in custom_call.md
#95976 merged Jun 28, 2025
Reduce latency by copy constructing device id vector directly rather than using for loop to create the vector followed by move.
#96117 merged Jun 28, 2025
#HLODiff Don't print parameter numbers in HLO canonical fingerprint.
#96113 merged Jun 28, 2025
[IFRT] Rename Client::GetDefaultLayout() to Client::GetDefaultPjRtLayout()
#96108 merged Jun 28, 2025
[XLA:MSA] Decrease copy resource scaling from 2^50 to 2^40. The 2^50 scaling factor sometimes causes total scores to exceed the max value of int64_t causing ubsan/asan errors. Enable the test failing ubsan errors.
#96027 merged Jun 27, 2025
[XLA] Add a Hlo print option for printing ENTRY keyword
#95970 merged Jun 27, 2025
Cleanup: rename variable_index to variable_arg_index in IfrtServingExecutable to improve code readability
#96107 merged Jun 27, 2025
Fix handling of None values with as_numpy_iterator.
#96009 merged Jun 27, 2025
[IFRT] Rename Array::layout() to Array::pjrt_layout()
#96099 merged Jun 27, 2025
Upgrade Bazel version in tf_keras build script.
#96101 merged Jun 27, 2025
Schedule computation in post order.
#96095 merged Jun 27, 2025
tsl: Add config_setting for running in CI.
#96097 merged Jun 27, 2025
Integrate LLVM at llvm/llvm-project@7a3356951053
#96023 merged Jun 27, 2025
Disable internal precompilation for local runs.
#96029 merged Jun 27, 2025
Add initial support for broadcasts in XnnGraphFusion.
#95912 merged Jun 27, 2025
add tpu_dynamic_registration
#96040 merged Jun 27, 2025
[xla:cpu:onednn] Enable matching dot + eltwise to oneDNN fusions in DotLibraryRewriter
#96094 merged Jun 27, 2025
[XLA] Handle sub-computations more correctly in constant folding.
#96093 merged Jun 27, 2025
Integrate StableHLO at openxla/stablehlo@955fa7e6
#96031 merged Jun 27, 2025
[xla:cpu] Delete ffi support from legacy runtime
#96049 merged Jun 27, 2025
[XLA] Slightly simplify explanation message code in HLO constant floding.
#96082 merged Jun 27, 2025
[xla:cpu:onednn] Refactor onednn_fusion. Put things that depend on oneDNN Graph in a separate file.
#96086 merged Jun 27, 2025
Split utilities for the weight cache code/tests to their own files.
#96013 merged Jun 27, 2025
Update xla_test to set the device as a specific GPU (e.g. h100, a100, etc) and update uses of xla_test_backend_predicates.h utilities to reflect this change
#95983 merged Jun 27, 2025
[xla] Remove unused thunk type
#96068 merged Jun 27, 2025
[xla:cpu] Delete unused cpu runtime symbols
#96048 merged Jun 27, 2025
[XLA] Factor out foldability check in HLO constant folding.
#96084 merged Jun 27, 2025
[xla:gpu] Add support for rendering command buffer execution graphs
#96042 merged Jun 27, 2025
[XLA] Add call folding test to constant propagation.
#96083 merged Jun 27, 2025
[XLA:GPU] Refactor code figuring out a support for unified latency estimator.
#95887 merged Jun 27, 2025
[XLA:GPU] Document interpolation API.
#96072 merged Jun 27, 2025
Fix an error in GetDotGroupPartitionContractingOutputShardings in SPMD dot handler.
#96076 merged Jun 27, 2025
[xla:copy_insertion] Fixed a problem in finding a rotated non-copyable chain.
#95905 merged Jun 27, 2025
PR #28225: [ROCm] Fix invalid rocblas version for rocm 7
#95996 merged Jun 27, 2025
Remove the old Triton search space code now that the new search space landed.
#96004 merged Jun 27, 2025
[xla][gpu] Nest gemm fusion: Update result shape and operand shape consistently.
#96058 merged Jun 27, 2025
PR #27836: Add support for NVSHMEM put/get
#96074 merged Jun 27, 2025
[XLA:GPU] Don't use optional for PropagateTileToInput[Broadcast|Transpose|Slice].
#96078 merged Jun 27, 2025
[XLA:GPU] don't fail in autotuner if a specific config does not fly
#96077 merged Jun 27, 2025
Iterate through all fusion heroes when looking for a transpose emitter.
#96014 merged Jun 27, 2025
[XLA:GPU] Create input buffers from executable instead of the root instruction.
#96007 merged Jun 27, 2025
Reverts fcdc6de87afe454721a374b503899ea7a9f05360
#96037 merged Jun 27, 2025
Avoid overflow for checks to see if there's space available in free_tail_bytes.
#96021 merged Jun 27, 2025
PR #28041: [ROCm] migrate swish fusion to upstream
#96018 merged Jun 27, 2025
PR #28108: Fixed ppc64le onednn build issue
#96039 merged Jun 27, 2025
[XLA:GPU] Create RedzoneBuffers from Executable.
#96010 merged Jun 27, 2025
[Phase Compilation] Part-2: Introduces xla::PjRtPhaseCompiler
#95087 merged Jun 27, 2025
Remove alias_info_ member from HloDataflowAnalysis.
#95935 merged Jun 27, 2025
Automated Code Change
#95752 merged Jun 27, 2025
[xla:gpu] Nest gemm fusion: only hoist bitcasts upwards and support layouts.
#95949 merged Jun 27, 2025
Add support for CeilDiv affine expr.
#95694 merged Jun 27, 2025
Update log for involuntary full rematerialization.
#95948 merged Jun 27, 2025
Cleanup: Remove unnecessary vlogs.
#96047 merged Jun 27, 2025
[xla:cpu] Delete fork join runtime support
#96025 merged Jun 27, 2025
Remove Deprecated cuDNN _v7 API Usage from cuda_dnn
#95703 merged Jun 27, 2025
[XLA:MSA] Reduces the MSA compile time by improving the prefetch allocation time. Since checking for having enough copy resources is an expensive call, we should first check for other necessary conditions of having a prefetch before we come to this check. Therefore, we move down the resource availability check down to after checks for max outstanding copies and copy ordering violations.
#96041 merged Jun 27, 2025
[IFRT IR] Add utility for initializing PassManager and dumping MLIR textual repr after passes
#95980 merged Jun 26, 2025
Bump XNNPACK version for open source builds.
#96038 merged Jun 26, 2025
Increase size of return type of ComputePeakMemory to int64_t.
#96032 merged Jun 26, 2025
Skip CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL When Calling CUPTI ActivityDisable API
#95984 merged Jun 26, 2025
Rollback integration of hermetic C++ toolchains in Tensorflow as they increase the wheel size and raise the value of manyLinux compliancy tag.
#96033 merged Jun 26, 2025
[XLA] In MHLO to HLO conversion, convert the location on the call into opmetadata.
#96026 merged Jun 26, 2025
Add _XlaShardingV2 to tf.XlaShardOp and use it for tf2xla lowering.
#94675 merged Jun 26, 2025
Register ChloDialect in tf_tfl_translate.cc
#95158 merged Jun 26, 2025
Most tests should not use HloRunnerAgnosticTestBase directly.
#95987 merged Jun 26, 2025
[XLA:GPU] implement symbolic tile for slice
#96019 merged Jun 26, 2025
Disable hermetic C++ for CUDA builds because the wheel size is too big.
#95994 merged Jun 26, 2025
[XLA:GPU] Document unified latency estimator behavior.
#96017 merged Jun 26, 2025
Integrate Triton up to [0a4aa69](https://github.com/openai/triton/commits/0a4aa6960599b17290eb942d71d14afc1775b175)
#95969 merged Jun 26, 2025
Add a tuple sharding when creating get-tuple-element(tuple(single_result)).
#95890 merged Jun 26, 2025
Automated Code Change
#95751 merged Jun 26, 2025
[JAX] Remove the PjRt-IFRT dependency from the DCN transfer library to avoid a circular dependency in upcoming changes to PjRt-IFRT to support cross-host transfers.
#95904 merged Jun 26, 2025
[XLA:GPU] Implement upwards tile propagation for BroadcastOp.
#96005 merged Jun 26, 2025
#sdy Sync unreduced axes of input and output of transpose, reshape, and copy.
#95977 merged Jun 26, 2025
Automated Code Change
#96008 merged Jun 26, 2025
De-duplicate zero points for per channel quantized tensors when all the zero points are the same.
#95816 merged Jun 26, 2025
[XLA:GPU] Implement upwards tile propagation for TransposeOp.
#96011 merged Jun 26, 2025
[XLA:CPU] Add nsw to thread indexing truncate
#95964 merged Jun 26, 2025
[XLA:CPU] Force 64 bit index for cpu loop fusions.
#95963 merged Jun 26, 2025
Roll forward after fixing internal tests
#96000 merged Jun 26, 2025
[XLA:GPU] Add default implementation for RunHloPasses.
#95999 merged Jun 26, 2025
[XLA:CPU] Add pass to add reassoc fast math flag to reduction ops.
#95947 merged Jun 26, 2025
[XLA:GPU] check if the generic dot emitter is enabled in nest gemm fusion pass
#95961 merged Jun 26, 2025
PR #27412: Command buffer respect control dependency of HloInstruction when running with concurrent mode.
#95973 merged Jun 26, 2025
[XLA:GPU][Emitters] Create the base class for PackedTranspose and TransposeFusion.
#95943 merged Jun 26, 2025
Add better builder method for PrecisionConfigAttr
#95915 merged Jun 26, 2025
[XLA:GPU] keep layout elements bitsize when hoisting bitcasts
#95955 merged Jun 26, 2025
Exclude llvm-project contents from the TF wheel.
#95988 merged Jun 26, 2025
Make Pjrt C API aware of CompiledMemoryStats::peak_memory_in_bytes.
#95972 merged Jun 26, 2025
[XLA:MSA] Fix bug in alternate memory allocation for min time and forced prefetches in alternate memory where chunks were not being reserved and added to the list of pending chunks. This caused bugs with overlapping chunks when using buffer coloring in alternate memory.
#95986 merged Jun 26, 2025
Add a field to plugin_attributes to indicate whether the PjRt plugin supports cross-host device transfers.
#95812 merged Jun 26, 2025
Disable memory space assignment test case that fails with address sanitization.
#95982 merged Jun 26, 2025
Add better builder to create ResultAccuracyAttr.
#95985 merged Jun 26, 2025
Remove UpdateEntryComputationLayout from HloRunnerPjRt.
#95911 merged Jun 26, 2025
Check combiner weight VJP computation shape before use.
#95822 merged Jun 26, 2025
[IFRT Proxy] Use IFRT SerDes versioning
#95479 merged Jun 26, 2025
[XLA] Don't try to simplify While ops if their body or condition has more than one use.
#95975 merged Jun 26, 2025
Remove the tuple handling in BufferFromHostLiteral
#95933 merged Jun 25, 2025
Integrate LLVM at llvm/llvm-project@bae48ac3c0e6
#95894 merged Jun 25, 2025
Roll forward - add back checking if buffers are available
#95891 merged Jun 25, 2025
[HLO Graph Dumper] Handle tuple inputs to kCall ops gracefully.
#95939 merged Jun 25, 2025
[IFRT IR] Add options to dump IFRT MLIR passes to IfrtIrCompileOptions.
#95968 merged Jun 25, 2025
Add a warning to tflite::Subgraph functions that return pointers that may be invalidated.
#95962 merged Jun 25, 2025
Add ToLiteral and LazyToLiteral to CommonPjRtBufferImpl.
#95829 merged Jun 25, 2025
Move tensorflow/third_party/stablehlo to xla/third_party/stablehlo
#95478 merged Jun 25, 2025
[XLA] Reduce Window Rewriter refactoring to reduce complexity
#95832 merged Jun 25, 2025
[IFRT] Facilitate SerDes version propagation by detecting use of any default SerDes version
#95837 merged Jun 25, 2025
Disable tests for which internal precompilation does not currently work.
#95490 merged Jun 25, 2025
[XLA:MSA] Fix bug in repack allocation blocks -
#95765 merged Jun 25, 2025
Disable worker_tags_test in py3.13 due to timeouts in linux
#95966 merged Jun 25, 2025
Integrate hermetic ML toolchains for TensorFlow.
#95488 merged Jun 25, 2025
#HLODiff Add unmatched computations to summary->computation_diff_patterns
#95833 merged Jun 25, 2025
Move tensorflow/third_party/tensorrt to xla/third_party/tensorrt
#95906 merged Jun 25, 2025
Improve XNNpack weight cache test error logs.
#95952 merged Jun 25, 2025
[jaxlib] Guard new transfer library API calls on JAX_IFRT_VERSION_NUMBER.
#95958 merged Jun 25, 2025
Remove upper bound for numpy
#95895 merged Jun 25, 2025
[XLA:GPU] Small readability fixes, soft-fail CHECKs, pass in proper number of communicators.
#95876 merged Jun 25, 2025
Add an option to enable GPU collective cancelling.
#95899 merged Jun 25, 2025
[XLA:GPU] Canonicalize tile strides of 0 to 1 when eligible.
#95807 merged Jun 25, 2025
Skip in-memory XNNPack weight cache tests on Windows.
#95951 merged Jun 25, 2025
[XLA:GPU] Remove superfluous std::cout.
#95950 merged Jun 25, 2025
Remove no-op wrapper for missed_heartbeat_callback.
#95945 merged Jun 25, 2025
Adds ThreadPool to autotuner.
#95869 merged Jun 25, 2025
PR #28190: [NVIDIA GPU] Add copies for collective memory ops if they are consuming from constant or module inputs
#95938 merged Jun 25, 2025
[xla:cpu] Delete unused sample harness
#95916 merged Jun 25, 2025
Add Serialization support to KernelLoaderSpec
#95932 merged Jun 25, 2025
Do not change the sharding type if there is only one devices.
#95582 merged Jun 25, 2025
[HLO Graph Dumper] Add caller instructions to HLO Computations or mark them as ENTRY.
#95867 merged Jun 25, 2025
[XLA:GPU] Store upper bounds of the tile directly in the SymbolicTile.
#95901 merged Jun 25, 2025
PR #28161: [ROCm] Introduce rocm6.4.1 hermetic dependency
#95929 merged Jun 25, 2025
PR #27988: [ROCm] Pass AMDGPU_TARGETS to crosstool wrapper
#95930 merged Jun 25, 2025
Automated Code Change
#95862 merged Jun 25, 2025
PR #27994: Multihost HLO Runner: Add CLI Option for Output Mode
#95936 merged Jun 25, 2025
[XLA:GPU] Adjust nic speed for sol cost model.
#95882 merged Jun 25, 2025
Automated Code Change
#95666 merged Jun 25, 2025
Automated Code Change
#95923 merged Jun 25, 2025
Automated Code Change
#95853 merged Jun 25, 2025
Automated Code Change
#95743 merged Jun 25, 2025
Introduce repo environment variable CUDA_EXTRA_COPTS
#95269 merged Jun 25, 2025
Update ml_dtypes to 0.5.1 to align with JAX and TensorFlow
#95610 merged Jun 25, 2025
[JAX] Add overloads to the DCN transfer library in preparation to remove the PjRt-IFRT dependency.
#95542 merged Jun 25, 2025
Add peak_memory_in_bytes to CompiledMemoryStats.
#95688 merged Jun 25, 2025
Use fallback for reduce ops without bodies (type inference builders)
#95841 merged Jun 25, 2025
Update the version of SIGN operator in reference resolver.
#95914 merged Jun 25, 2025
Fix newly-broken debug_options_flags_test.
#95908 merged Jun 25, 2025
[XLA:CPU] Fix workitem size for small outer dimensions.
#95878 merged Jun 24, 2025
Add source_target_pairs to send/recv ops in StableHLO
#94495 merged Jun 24, 2025
[HLO Graph Dumper] Annotate parameters inside called computations with where the inputs come from.
#95898 merged Jun 24, 2025
Remove AncestorSubgraphSimilarity in favor of the LCS one.
#95815 merged Jun 24, 2025
Fix test failing internally: warnings_bazelrc_test.
#95834 merged Jun 24, 2025
[xla:cpu:xnn] Add more XNN elementwise ops support to DotLibraryRewriter
#95900 merged Jun 24, 2025
Adds flags for NCCL non-blocking communicators and async execution.
#95354 merged Jun 24, 2025
Add CommonPjRtBufferImpl for subclasses
#95549 merged Jun 24, 2025
[ #HLODiff ] Increase the weight of the parent sim and adjust the considered subgraph size.
#95697 merged Jun 24, 2025
[XLA][Numerics][HLO Value Tracking] Add a function to create an original value for an HLO instruction
#95600 merged Jun 24, 2025
Remove old heartbeat flags and arguments.
#95293 merged Jun 24, 2025
[ #HLODiff ] Add parent similarity based on the LCS of parent subgraph BFS sequence.
#95603 merged Jun 24, 2025
Take static buffers into account, when computing peak memory.
#95814 merged Jun 24, 2025
Retrieve compilation result gracefully with shared_ptr instead of std::move.
#95821 merged Jun 24, 2025
[XLA:CPU] Improve exp perf via more targeted nan handling.
#95811 merged Jun 24, 2025
DebugOptions: Mark all singular scalar fields as explicitly optional.
#95826 merged Jun 24, 2025
Fix typos in documentation strings
#95868 merged Jun 24, 2025
[xla:cpu:xnn] Store HLO -> XNNPACK unary/binary op mapping in maps.
#95888 merged Jun 24, 2025
[PjRt:CPU] Make the order of CPU memories the same as TPU memories
#95839 merged Jun 24, 2025
[XLA:CPU] Disable scatter & gather on AVX512
#95879 merged Jun 24, 2025
Fix a bug in RealDivWithF32ConstDivisor.
#95704 merged Jun 24, 2025
[tosa] Re-register the quantization dialect
#95844 merged Jun 24, 2025
Change Command Buffer Conversion Pass to convert thunks partially based on the enabled commands.
#95798 merged Jun 24, 2025
Cast output to float explicitly.
#95813 merged Jun 24, 2025
[XLA:GPU] Simplify if-statement.
#95886 merged Jun 24, 2025
Add TFLITE_XNNPACK_DELEGATE_FLAG_DISABLE_SUBGRAPH_RESHAPING flag
#95846 merged Jun 24, 2025
Reverts 1bfce10aa0d09118f5468179b075f17a84fcc75e
#95880 merged Jun 24, 2025
[mlir][tosa] Fix tfl to tosa for variable op
#95840 merged Jun 24, 2025
Allow KernelLoaderSpec to own its payload
#95872 merged Jun 24, 2025
[XLA:CPU] Add header to no_mkl single threaded matmul.
#95875 merged Jun 24, 2025
A minimal example of transforming a thunk sequence to command buffer -- I don't do any checks, just combine everything to one big CommandBuffer. A more reasonable sequence collection is comming in the following change.
#95518 merged Jun 24, 2025
[xla:gpu] Add a pass for RaggedDot lowering in XLA:GPU
#95802 merged Jun 24, 2025

93 Pull requests opened by 5 people

Fix incorrect per-channel scaling in fully_connected on Android
#95881 opened Jun 24, 2025
Support shardy in orbax.
#95883 opened Jun 24, 2025
[XLA:GPU]: Vectorize better for all-reduce kernel
#95884 opened Jun 24, 2025
[XLA:GPU] enable dynamic-slice instruction in generic triton support (try 2)
#95892 opened Jun 24, 2025
[XLA:GPU]: Calculate launch dimensions based on input size.
#95893 opened Jun 24, 2025
Integrate LLVM at llvm/llvm-project@13bb7948c914
#95896 opened Jun 24, 2025
[tf2jax] use V2 sharding if available.
#95897 opened Jun 24, 2025
Add metadata for CUDA and libtpu versions
#95903 opened Jun 24, 2025
Add CopyToRemote() to CommonPjRtBufferImpl.
#95907 opened Jun 24, 2025
#HLODiff Add bipartite matching to GreedyTopDownMatcher.
#95909 opened Jun 25, 2025
[xla] Add incorrectly formatted json string to the error message when converting json to proto. While here - fix imports, use absl types and std::string.
#95913 opened Jun 25, 2025
fix warning on swift withUnsafeBytes
#95917 opened Jun 25, 2025
Move expensive variables on their last use to avoid copies.
#95934 opened Jun 25, 2025
PR #28184: Extend `WhileLoopAllReduceCodeMotion` pass with a new pattern (DUS)
#95940 opened Jun 25, 2025
Implement RaggedDot in HloEvaluator
#95946 opened Jun 25, 2025
Remove custom logging implementation from TSL.
#95953 opened Jun 25, 2025
[XLA:GPU]: Calculate rank_offset and rotated_ranks outside the kernel.
#95954 opened Jun 25, 2025
[xla:layout_assignment] Support buffer types.
#95956 opened Jun 25, 2025
[xla:gpu] Handle buffer related custom calls.
#95957 opened Jun 25, 2025
Separate transfer streams and compute stream
#95965 opened Jun 25, 2025
[ #HLODiff ] Filter ignored opcodes before rendering results.
#95971 opened Jun 25, 2025
[XLA] Add offset to BufferAllocation.
#95974 opened Jun 25, 2025
Implement non-descending layout support for `ToLiteral`
#95979 opened Jun 25, 2025
Move `tensorflow/third_party/triton` to `xla/third_party/triton`
#95981 opened Jun 25, 2025
lite: Propagate an error during OpInit()
#95989 opened Jun 26, 2025
Remove LiteRT modules from TF python deps.
#95991 opened Jun 26, 2025
Modify MLIR location of TFLite op to contain TFLite tensor names
#95992 opened Jun 26, 2025
PR #26445: [NFC] Clean up StreamKind and StreamId from GpuCliqueKey
#96020 opened Jun 26, 2025
Integrate LLVM at llvm/llvm-project@13bb7948c914
#96024 opened Jun 26, 2025
Move `tensorflow/third_party/shardy` to `xla/third_party/shardy`
#96028 opened Jun 26, 2025
Create new CI configurations
#96034 opened Jun 26, 2025
Add attribute to configure different lane sizes in sparsecore preprocessing ops.
#96036 opened Jun 26, 2025
[XLA] Refactoring Reduce Window Rewriter to reduce complexity
#96043 opened Jun 26, 2025
[ #HLODiff ] Sort diff patterns by diff size instead of group size.
#96044 opened Jun 26, 2025
[ #HLODiff ] Refine computation diff.
#96046 opened Jun 26, 2025
Add random perturbations to the xla_tpu_msa_sort_order_overrides flag
#96050 opened Jun 27, 2025
PR #27836: Add support for NVSHMEM put/get
#96066 opened Jun 27, 2025
[XLA:GPU] Refactor code and avoid a FromProto/ToProto roundtrip.
#96069 opened Jun 27, 2025
Migrate from CanShareBuffer hook to AliasInfo.
#96075 opened Jun 27, 2025
Add a tuple sharding when creating get-tuple-element(tuple(single_result)).
#96080 opened Jun 27, 2025
Refactor the code for getting a sharding attribute from mlir::TF::XlaShardingOp.
#96081 opened Jun 27, 2025
[XLA:GPU] Flip unified cost model on.
#96087 opened Jun 27, 2025
strict_cc_test: Only set --gtest_fail_if_no_test_selected on CI.
#96088 opened Jun 27, 2025
is_ci_build: Move from XLA to TSL.
#96092 opened Jun 27, 2025
Eliminate circular dependency between device assignment and Computation Placement. Use Device Assignment for looking up logical and global ids.
#96103 opened Jun 27, 2025
[xla:cpu] Remove infeed/outfeed support from XLA:CPU
#96104 opened Jun 27, 2025
Remove all uses of `xla_test_library`
#96106 opened Jun 27, 2025
[XLA:SCHEDULING] Defer scheduling of allocate buffer custom calls
#96110 opened Jun 27, 2025
Add an async version of FunctionalHloRunner::Run that runs a pjrt executable.
#96111 opened Jun 27, 2025
Reverts 60077e5a6c50a9ad655fab6879e2bb432475b57b
#96112 opened Jun 27, 2025
Update VLOG message to make it more accurate.
#96114 opened Jun 27, 2025
Enable tests and fix for duplicate registration tests error
#96116 opened Jun 27, 2025
[Phase Compilation] Part-3: Augment extension with buffer deletion callback.
#96118 opened Jun 28, 2025
Add traceme for IFRT executable launch
#96119 opened Jun 28, 2025
OSS compiler test for aggregate types.
#96122 opened Jun 28, 2025
Integrate LLVM at llvm/llvm-project@67a5fc8e12dc
#96123 opened Jun 28, 2025
Remove extra space in stablehlo.while pretty print
#96127 opened Jun 28, 2025
Automated Code Change
#96143 opened Jun 29, 2025
Automated Code Change
#96149 opened Jun 29, 2025
Automated Code Change
#96158 opened Jun 29, 2025
Make IsDeleted() a const method
#96159 opened Jun 29, 2025
PR #19067: [XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#96177 opened Jun 30, 2025
Update protobuf to v6.30.1
#96183 opened Jun 30, 2025
[XLA:GPU] Deprecate CUBLAS_GEMM_DEFAULT_TENSOR_OP.
#96186 opened Jun 30, 2025
Upgrade Abseil to [LTS 20250127.1](https://github.com/abseil/abseil-cpp/releases/tag/20250127.1)
#96187 opened Jun 30, 2025
[XLA][Numerics][HLO Value Tracking] Add the original value recovery table
#96188 opened Jun 30, 2025
build: Set is_ci flag in OSS CI.
#96190 opened Jun 30, 2025
kernels: Remove use of icu::UnicodeStringAppendable
#96194 opened Jun 30, 2025
#sdy Move `AddAxisOrMergeInserter` to Shady OSS utils.
#96195 opened Jun 30, 2025
[xla] Add an AUTHORS file.
#96198 opened Jun 30, 2025
Example Extension with cpp dependency injection
#96199 opened Jun 30, 2025
Experiment: Try to remove lite rt dependencies from TF wheel.
#96205 opened Jun 30, 2025
Integrate LLVM at llvm/llvm-project@67a5fc8e12dc
#96207 opened Jun 30, 2025
Fix TPU detection in Google Colab V2 environments
#96208 opened Jun 30, 2025
Construct XLA GPU client with coordination service client.
#96210 opened Jul 1, 2025
Use schedule order to calculate resource usage.
#96211 opened Jul 1, 2025
Remove redundant logs and cast.
#96212 opened Jul 1, 2025
Integrate LLVM at llvm/llvm-project@67a5fc8e12dc
#96213 opened Jul 1, 2025
Add 'mode' attribute to AllReduce and ReduceScatter.
#96214 opened Jul 1, 2025
Automated Code Change
#96217 opened Jul 1, 2025
Do not build xnn fusions inside reducers.
#96219 opened Jul 1, 2025
Internal change blah blah blah
#96220 opened Jul 1, 2025
Automated Code Change
#96222 opened Jul 1, 2025
Automated Code Change
#96223 opened Jul 1, 2025
Automated Code Change
#96227 opened Jul 1, 2025
Automated Code Change
#96233 opened Jul 1, 2025
Automated Code Change
#96235 opened Jul 1, 2025
Automated Code Change
#96237 opened Jul 1, 2025
Automated Code Change
#96240 opened Jul 1, 2025
[XLA:GPU] Remove horizontal fusion.
#96241 opened Jul 1, 2025
Bump the github-actions group with 2 updates
#96243 opened Jul 1, 2025
PR #19067: [XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#96244 opened Jul 1, 2025
PR #24579: [XLA:CPU][oneDNN] Add SiLU activation for oneDNN contractions
#96245 opened Jul 1, 2025

12 Issues closed by 5 people

tf.nn.conv2d with invalid input dims crashes in TF ≤2.19 — now raises InvalidArgumentError in nightly
#95415 closed Jul 1, 2025
My trained model detects almost every classes include target class.
#96060 closed Jun 30, 2025
erorr
#96160 closed Jun 30, 2025
Protobuf 4.24.0 break tensorflow and causes segfault with TF 2.12
#61551 closed Jun 29, 2025
Numpy and tf experimental Numpy differ in vander matrix creation case for N=0
#60628 closed Jun 28, 2025
[TFLite] flatbuffer64 support for TFlite
#60570 closed Jun 27, 2025
rejection_resample loses track of ragged tensors
#60583 closed Jun 27, 2025
Weird memory usage of shuffling in `tf.data.Dataset`
#60599 closed Jun 27, 2025
control_flow_ops_test unit test is flaky
#60629 closed Jun 27, 2025
Support/Feature Request: Pre-processing very large corpus text file as tokens to train GPT Models.
#60539 closed Jun 26, 2025
tf.linalg.matrix_rank results has different results with or without @tf.function for numpy inputs under tensorflow-cpu
#60547 closed Jun 26, 2025
how to build libtensorflowlite_c.so with Address Sanitizer
#95222 closed Jun 25, 2025

5 Issues opened by 5 people

how to get profile of per operation that delegate gpu opencl like cpu enable_op_profiling result, rather than only ModifyGraphWithDelegate?
#96239 opened Jul 1, 2025
Multiple segmentation faults and aborted in some modules
#96209 opened Jun 30, 2025
Unable to use Tensorflow and Tensorflow-TPU on Google Colab
#96203 opened Jun 30, 2025
Inconsistent `tf.math.reciprocal` Behavior for complex128 `inf` between CPU and GPU
#96180 opened Jun 30, 2025
Current Status Regarding TFLite vs LiteRT
#95928 opened Jun 25, 2025

50 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[oneDNN] Add Infer after last allow to be added to Allow list
#84874 commented on Jun 30, 2025 • 1 new comment
how to use libtensorflowlite_c.so C API and delegate gpu opencl correctly?
#95634 commented on Jun 24, 2025 • 0 new comments
build(aarch64): Update to oneDNN-3.7 + ACL-24.12 (fix)
#93951 commented on Jul 1, 2025 • 0 new comments
[NCCL] Upgrade TF NCCL version to 2.26.5
#94053 commented on Jun 26, 2025 • 0 new comments
Enable Stablehlo -> HLO lowering by default.
#94296 commented on Jun 25, 2025 • 0 new comments
Fix comparison functions and add unit tests
#94484 commented on Jun 30, 2025 • 0 new comments
#sdy #mixed_serialization don't make JAX export use `SdyRoundTripExportPipeline` to stringify attributes and convert ops to StableHLO `CustomCallOp`s and back.
#94871 commented on Jun 30, 2025 • 0 new comments
Move Mosaic CC sources into XLA
#94877 commented on Jul 1, 2025 • 0 new comments
[XLA] Add stack trace breakdown to `HloLiveRange::ToString` for peak memory usage
#94954 commented on Jun 27, 2025 • 0 new comments
Fix: Safely Capture and Store TF Operation Stack Trace at Creation to Prevent Dangling Reference Errors
#95034 commented on Jul 1, 2025 • 0 new comments
[Phase Compilation] Part-3: Add C++ layers to test and interact with C PJRT API.
#95089 commented on Jun 27, 2025 • 0 new comments
Port recent MHLO changes to StableHLO optimization path.
#95210 commented on Jun 25, 2025 • 0 new comments
[PROTOTYPE] Cleanup TFL dependencies in tosa
#95262 commented on Jul 1, 2025 • 0 new comments
Remove old heartbeat options.
#95294 commented on Jun 27, 2025 • 0 new comments
Fix subprocess.check_output decoding issue in pip_smoke_test.py to handle byte output safely
#95335 commented on Jul 1, 2025 • 0 new comments
io_utils: prevent `input()` crash in non-interactive mode
#95525 commented on Jul 1, 2025 • 0 new comments
Expose Auto Sharding to StableHLO.
#95535 commented on Jul 1, 2025 • 0 new comments
[mlir][tosa] Support negative axis in gather lowering
#95608 commented on Jul 1, 2025 • 0 new comments
Added `PjrtClient::UpdateGlobalProcessInfo` method.
#95611 commented on Jul 1, 2025 • 0 new comments
#HLODiff Add tiny bonus to position match
#95612 commented on Jun 24, 2025 • 0 new comments
Pass AliasInfo to HloAliasAnalysis.
#95797 commented on Jul 1, 2025 • 0 new comments
Add `use_shardy_partitioner` in `tf.tpu.XlaOptions`, `TPUReplicateMetadata`. The default value is false.
#95799 commented on Jul 1, 2025 • 0 new comments
Port `ThunkPassPipeline` with `CommandBufferConversionPass` into gpu_compiler. Introduce a flag, that switches creation of command buffers from HLO level to Thunk level. This is a temporary flag to provide a smooth transition.
#95801 commented on Jun 24, 2025 • 0 new comments
Integrate LLVM at llvm/llvm-project@bae48ac3c0e6
#95825 commented on Jun 24, 2025 • 0 new comments
[mlir][tosa] Allow variable indices in scatter_nd lowering
#95830 commented on Jun 25, 2025 • 0 new comments
Update Protobuf to 6.30.1
#95873 commented on Jun 30, 2025 • 0 new comments
Mismatch Between Quantized TFLite Layer Outputs and Expected Mathematical Values When Using get_tensor()
#93917 commented on Jun 26, 2025 • 0 new comments
typing_extensions >= 4.6.0 causes pip unit test failure
#60687 commented on Jun 26, 2025 • 0 new comments
api_compatibility_test fails on Python 3.11
#60679 commented on Jun 26, 2025 • 0 new comments
//tensorflow/python/ops/ragged:ragged_cross_op_test is flaky
#60670 commented on Jun 26, 2025 • 0 new comments
Insufficient documentation on GPU use, especially in MPS
#60647 commented on Jun 26, 2025 • 0 new comments
Incorrect gradient after divide operation when result contains inf
#60695 commented on Jun 26, 2025 • 0 new comments
can we customize memory allocation functions(like malloc/free) for inference with C api?
#60662 commented on Jun 26, 2025 • 0 new comments
TensorFlow Docker `tensorflow/tensorflow:latest-gpu` fails to detect GPU due to CUDA/cuDNN mismatch
#94593 commented on Jun 26, 2025 • 0 new comments
java.lang.IllegalArgumentException: Internal error: Error applying delegate:
#93525 commented on Jun 26, 2025 • 0 new comments
standalone pip package for tf.io.gfile.GFile
#45942 commented on Jun 26, 2025 • 0 new comments
YoloX different Model Output for Python and Android
#95489 commented on Jun 27, 2025 • 0 new comments
/libtensorflow_cc.so.2: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringB5cxx11ERKNS_15OpKernelContextEb
#95758 commented on Jun 30, 2025 • 0 new comments
Aborted (core dumped) in `tf.image.non_max_suppression`\`tf.raw_ops.NonMaxSuppressionV2`\`tf.raw_ops.NonMaxSuppressionV3`\`tf.raw_ops.NonMaxSuppressionV4`
#95760 commented on Jun 30, 2025 • 0 new comments
Aborted (core dumped) in `tf.distribute.Server`
#95762 commented on Jun 30, 2025 • 0 new comments
Runtime error when using empty axes for reduce operator
#95663 commented on Jun 30, 2025 • 0 new comments
It doesn't support on python3.13
#78774 commented on Jun 30, 2025 • 0 new comments
graph execution error bug with tfm.nlp.layers.MultiHeadRelativeAttention
#94599 commented on Jul 1, 2025 • 0 new comments
how to use libtensorflowlite_c.so C API and delegate gpu opencl correctly?
#95795 commented on Jul 1, 2025 • 0 new comments
[ROCm] Enable unsafe fp atomics and cleanup gpu_device_functions.h
#86704 commented on Jun 26, 2025 • 0 new comments
Fix compile error in tensorflow/python/tfcompile_wrapper.cc on s390x
#87676 commented on Jun 30, 2025 • 0 new comments
Fix: Ensure boolean_mask_v2() only accepts boolean dtype for mask
#89370 commented on Jun 30, 2025 • 0 new comments
Move duplicate CUDA/XLA registration logs from INFO to VLOG
#89808 commented on Jun 30, 2025 • 0 new comments
TfLite elementwise_ops add type support (#104)
#92706 commented on Jun 30, 2025 • 0 new comments
[XLA:GPU] NFC, refactor DotDecomposer.
#93521 commented on Jun 30, 2025 • 0 new comments