add static size information for hvcat and hvncat #58422

johnnychen94 · 2025-05-15T12:29:37Z

When creating an array using scalar numbers, the array initialization method is currently the fastest way. That is,

A = Matrix{Float64}(under, 2, 2)
A[1, 1] = x
A[2, 1] = y
A[1, 2] = x
A[2, 2] = y

is faster than the concatenation method [x x; y y]. This PR tries to close the performance gaps.

Optimization overview:

Static shape information is now preserved during syntax parsing, which improves performance when concatenating scalar values into 2D or higher-dimensional arrays.
An additional manual loop unrolling has been introduced for small matrix scenarios to boost performance further.

Key results:

For small matrices, the performance of array concatenation using [] is now comparable to that of direct matrix initialization. This resolves a previously significant performance gap.
For large matrices, concatenation shows substantial performance improvements. However, it still lags behind direct initialization in terms of speed.

Future work:

Noticeable performance differences remain for large matrices.

Benchmark performance

The very first version of this commit is implemented based on 1.10.9 release branch, thus I've also included the 1.10-dev performance.
Notice the performance differences between the initialization method and the concatenation method.

`[x x; y y]` — 2D Matrix Creation

Same Data Type

using BenchmarkTools

f(x, y) = [x x; y y]
@btime f(x, y) setup=(x = rand(); y = rand())

2×2 Matrix

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	18.021	37.258
1.10.9	18.634	263.918
1.10-dev	19.660	19.749
1.12.0-beta3	7.690	31.913
1.13-dev	8.644	8.793

6×6 Matrix

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	30.865	144.490
1.10.9	32.856	2198
1.10-dev	32.278	31.363
1.12.0-beta3	19.315	129.225
1.13-dev	18.553	984.615

Mixed Data Types

f(x, y) = [x x; y y]
@btime f(x, y) setup=(x = rand(); y = rand(1:10))

2×2 Matrix

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	17.794	72.243
1.10.9	18.587	844.879
1.10-dev	19.467	19.394
1.12.0-beta3	8.066	72.766
1.13-dev	8.812	8.554

6×6 Matrix

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	35.994	11284
1.10.9	36.748	23835
1.10-dev	32.352	646.955
1.12.0-beta3	27.590	9873
1.13-dev	29.428	7662

`[x x;;; y y]` — Higher-Dimensional Array Creation

Same Data Type

g(x, y) = [x x;;; y y]
@btime g(x, y) setup=(x = rand(); y = rand())

1×2×2 Array

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	19.754	112.885
1.10.9	20.175	108.488
1.10-dev	21.437	19.700
1.12.0-beta3	10.646	183.057
1.13-dev	9.732	10.775

3×3×3 Array

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	43.327	445.046
1.10.9	44.832	473.566
1.10-dev	44.633	48.476
1.12.0-beta3	19.072	496.601
1.13-dev	17.061	38.536

Mixed Data Types

g(x, y) = [x x;;; y y]
@btime g(x, y) setup=(x = rand(); y = rand(1:10))

1×2×2 Array

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	19.192	147.851
1.10.9	20.126	147.297
1.10-dev	20.719	19.509
1.12.0-beta3	24.837	270.547
1.13-dev	19.595	10.131

3×3×3 Array

Julia version	Initialization (ns)	Concatenation (ns)
1.9.3	44.267	753.900
1.10.9	43.800	750.264
1.10-dev	43.620	363.485
1.12.0-beta3	47.225	837.426
1.13-dev	42.404	402.965

benchmark test script

using BenchmarkTools

function hvcat_large_baseline(x, y)
    A = Array{Float64, 2}(undef, 6, 6)
    for i in 1:6
        for j in 1:6
            A[i, j] = i == j ? x : y
        end
    end
    return A
end
hvcat_large(x, y) = [
    x y y y y y
    y x y y y y
    y y x y y y
    y y y x y y
    y y y y x y
    y y y y y x
]

function hvcat_small_baseline(x, y)
    A = Array{Float64, 2}(undef, 2, 2)
    for i in 1:2
        for j in 1:2
            A[i, j] = i == j ? x : y
        end
    end
    return A
end
hvcat_small(x, y) = [
    x y
    y x
]

function hvncat_large_baseline(x, y)
    A = Array{Float64, 3}(undef, 3, 3, 3)
    for i in 1:3
        for j in 1:3
            for k in 1:3
                A[i, j, k] = i == j == k ? x : y
            end
        end
    end
    return A
end
hvncat_large(x, y) = [
    x y y; y y y; y y y;;;
    y y y; y x y; y y y;;;
    y y y; y y y; y y x;;;
]

function hvncat_small_baseline(x, y)
    A = Array{Float64, 3}(undef, 2, 2, 2)
    for i in 1:2
        for j in 1:2
            for k in 1:2
                A[i, j, k] = i == j == k ? x : y
            end
        end
    end
    return A
end
hvncat_small(x, y) = [
    x y; y y;;;
    y y; y x;;;
]

@assert hvcat_large_baseline(1.0, 2.0) == hvcat_large(1.0, 2.0)
@assert hvcat_small_baseline(1.0, 2.0) == hvcat_small(1.0, 2.0)
@assert hvncat_large_baseline(1.0, 2.0) == hvncat_large(1.0, 2.0)
@assert hvncat_small_baseline(1.0, 2.0) == hvncat_small(1.0, 2.0)

@assert hvcat_large_baseline(1.0, 2) == hvcat_large(1.0, 2)
@assert hvcat_small_baseline(1.0, 2) == hvcat_small(1.0, 2)
@assert hvncat_large_baseline(1.0, 2) == hvncat_large(1.0, 2)
@assert hvncat_small_baseline(1.0, 2) == hvncat_small(1.0, 2)

@info "hvcat, large, same type, array initialization"
@btime hvcat_large_baseline(x, y) setup=(x = rand(); y = rand())
@info "hvcat, large, same type, concatenation"
@btime hvcat_large(x, y) setup=(x = rand(); y = rand())
@info "hvcat, large, different type, array initialization"
@btime hvcat_large_baseline(x, y) setup=(x = rand(); y = rand(1:10))
@info "hvcat, large, different type, concatenation"
@btime hvcat_large(x, y) setup=(x = rand(); y = rand(1:10))

@info "hvcat, small, same type, array initialization"
@btime hvcat_small_baseline(x, y) setup=(x = rand(); y = rand())
@info "hvcat, small, same type, concatenation"
@btime hvcat_small(x, y) setup=(x = rand(); y = rand())
@info "hvcat, small, different type, array initialization"
@btime hvcat_small_baseline(x, y) setup=(x = rand(); y = rand(1:10))
@info "hvcat, small, different type, concatenation"
@btime hvcat_small(x, y) setup=(x = rand(); y = rand(1:10))

@info "hvncat, large, same type, array initialization"
@btime hvncat_large_baseline(x, y) setup=(x = rand(); y = rand())
@info "hvncat, large, same type, concatenation"
@btime hvncat_large(x, y) setup=(x = rand(); y = rand())
@info "hvncat, large, different type, array initialization"
@btime hvncat_large_baseline(x, y) setup=(x = rand(); y = rand(1:10))
@info "hvncat, large, different type, concatenation"
@btime hvncat_large(x, y) setup=(x = rand(); y = rand(1:10))

@info "hvncat, small, same type, array initialization"
@btime hvncat_small_baseline(x, y) setup=(x = rand(); y = rand())
@info "hvncat, small, same type, concatenation"
@btime hvncat_small(x, y) setup=(x = rand(); y = rand())
@info "hvncat, small, different type, array initialization"
@btime hvncat_small_baseline(x, y) setup=(x = rand(); y = rand(1:10))
@info "hvncat, small, different type, concatenation"
@btime hvncat_small(x, y) setup=(x = rand(); y = rand(1:10))

fixes #58397

base/abstractarray.jl

johnnychen94 · 2025-05-16T10:11:51Z

@nanosoldier runtests()

johnnychen94 · 2025-05-16T10:22:26Z

I'm still not very happy with the performance for large matrix creation when the element types are different:

using BenchmarkTools

f(x, y) = [
    x y y y y y
    y x y y y y
    y y x y y y
    y y y x y y
    y y y y x y
    y y y y y x
]

@btime f(x, y) setup=(x = rand(); y = rand(1:10))

It takes 7 μs on my machine (and I'm expecting ~70ns)

But this looks more like Julia's compiler issue related to tuple splattings (xs...) so I decide to give it up here for this PR.

nanosoldier · 2025-05-17T12:19:10Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

❗ Packages that crashed

1 packages crashed only on the current version.

A segmentation fault happened: 1 packages

4 packages crashed on the previous version too.

✖ Packages that failed

532 packages failed only on the current version.

Package fails to precompile: 439 packages
Illegal method overwrites during precompilation: 7 packages
Package has test failures: 4 packages
Package tests unexpectedly errored: 11 packages
Networking-related issues were detected: 36 packages
Tests became inactive: 1 packages
Test duration exceeded the time limit: 34 packages

1459 packages failed on the previous version too.

✔ Packages that passed tests

17 packages passed tests only on the current version.

Other: 17 packages

4910 packages passed tests on the previous version too.

~ Packages that at least loaded

3 packages successfully loaded only on the current version.

Other: 3 packages

2715 packages successfully loaded on the previous version too.

➖ Packages that were skipped altogether

1 packages were skipped only on the current version.

Package could not be installed: 1 packages

922 packages were skipped on the previous version too.

@generated

This commit removes the dependency of generated function to IdSet because it is not yet defined during the bootstrap stages. By doing this, we can avoid the world age issues and use @generated function in abstractarray.jl.

This significantly improves the performance of multi-dimensional array creation via scalar numbers using methods of [a b; c d], [a b;; c d] and typed T[a b; c d], T[a b;; c d] For small numeric array creation(length <= 16), manual loop unroll is used to further minimize the overhead, and it now has zero overhead and is as fast as the array initialization method.

johnnychen94 · 2025-05-18T16:29:39Z

base/abstractarray.jl

+@inline function hvcat_static(::Val{rows}, x::T, xs::Vararg{T}) where {rows, T<:Number}
+    typed_hvcat_static(T, Val{rows}(), x, xs...)
+end


The only regression happens here:

using BenchmarkTools f(x, y) = [ x y y y y y y x y y y y y y x y y y y y y x y y y y y y x y y y y y y x ] @btime f(x, y) setup=(x = rand(); y = rand())

It takes 984.615ns on my local laptop. In comparison, it takes only 128.225ns for Julia 1.12.

But this can be fixed if we duplicate the generated function definition:

@generated function hvcat_static(::Val{rows}, xs::T...) where {T<:Number, rows} # just copied everything from the typed_hvcat_static generated function's implementation end

By applying this fix, we get 24.995ns.

The only problems with this fix are:

This introduces an ambiguity issue caused by xs::T.... That forces we to write hvcat_static(::Val{rows}, x::T, xs::Vararg{T}) rather than hvcat_static(::Val{rows}, xs::Vararg{T}).

This duplicates the code, and I have almost no idea why this duplication works.

Doing this doesn't fix the performance for mixed element types, i.e., @btime f(x, y) setup=(x = rand(); y = rand(1:10)) (add static size information for hvcat and hvncat #58422 (comment))

Thus, I think the "fix" isn't elegant enough and did not commit it for this review version. If we find it necessary, I can add it up later.

johnnychen94 · 2025-05-18T16:32:01Z

@nanosoldier runtests(["MethodInspector", "PiecewiseIncreasingRanges", "WeaklyHard", "Skipper", "ConvexHulls2d", "ITerm2Images", "LevelSetMethods", "Reactive", "BlockMatching", "RegionTrees", "GridInterpolations", "Measurements", "DataFlowTasks", "HybridArrays", "QuadPol", "Uncertain", "OccurrencesInterface", "Rotations", "Interpolations", "AtmosphericProfilesLibrary", "Lowess", "Curves", "EQDSKReader", "NumericalIntegration", "AbstractImageReconstruction", "PeriodicGraphs", "SkyCoords", "PhasedArray", "XTermColors", "HydrophoneCalibrations", "DiskArrayTools", "Insolation", "NamedTrajectories", "ExampleJuggler", "IsopycnalSurfaces", "ManifoldDiff", "ABCDMatrixOptics", "GeoInterfaceMakie", "TimeseriesFeatures", "GeoTIFF", "OceanGrids", "CharacteristicInvFourier", "TuePlots", "ImageBinarization", "ChebyshevFiltering", "LazIO", "DensityRatioEstimation", "FaradayInternationalReferenceIonosphere", "DomainIntegrals", "ImageTransformations", "ImageMagick", "Sinograms", "ImageContrastAdjustment", "QuantumPropagators", "Manopt", "VisionHeatmaps", "SMLMMetrics", "SatelliteToolboxTransformations", "GeostatInversion", "AlgebraPDF", "SatelliteToolboxGeomagneticField", "AtomicData", "Transparency", "SatelliteToolboxGravityModels", "Ludwig", "RobotZoo", "Phylopic", "Epicrop", "ModelParameters", "PhysicalParticles", "SeparableFunctions", "SmoothedParticles", "Reproject", "GasDispersion", "GridUtilities", "PhysicalTrees", "ImageSmooth", "GraphRecipes", "DataInterpolations", "LimberJack", "MultiCDF", "CTFlows", "ShallowWaters", "OpenQuantumBase", "CartesianGrids", "MicroscopePSFs", "KomaMRIBase", "SatelliteToolbox", "Vlasiator", "ExtendableGrids", "KernelDensity", "PseudoPotentialIO", "CherenkovMediumBase", "Tonari", "NonParametricNORTA", "Jolab", "Coulter", "FreezeCurves", "BasicTextRender", "SimulatedAnnealingABC", "F16Model", "BivariateCopulas", "BifrostTools", "MonteCarloMarkovKernels", "ProbabilityBoundsAnalysis", "QPGreen", "HCIToolbox", "KomaMRIFiles", "WaterModels", "SyntheticEddyMethod", "ClimaAnalysis", "BHAtp", "GlobalSensitivityAnalysis", "Transits", "SimplexGridFactory", "OptimizationProblems", "Muspel", "CoherentTransformations", "ParetoSmooth", "ImageEdgeDetection", "AlphaStableDistributions", "EasyFit", "GeoParams", "NaiveBayes", "AdvancedMH", "FinancialMonteCarlo", "TimeseriesSurrogates", "SMLMSim", "FillOutliers", "Augmentor", "SixelTerm", "SimulationLogs", "FourierTools", "DataAugmentation", "EFIT", "DiffEqBase", "Maxnet", "TransferFunctions", "MLJNaiveBayesInterface", "SkyImages", "SliceSampling", "GridVisualize", "Altro", "StochasticGroundMotionSimulation", "GenomicOffsets", "SimpleDiffEq", "EchelleInstruments", "ContinuousWavelets", "TestImages", "Gadfly", "SatelliteAnalysis", "StateSpaceDynamics", "KomaMRICore", "Algames", "SodShockTube", "AeroFuse", "SeisReconstruction", "KernelDensityEstimatePlotting", "TaylorIntegration", "FourierFlows", "SeisProcessing", "Jadex", "PossibilisticArithmetic", "HCGeoTherm", "Isoplot", "ProfileLikelihood", "ImageQualityIndexes", "SteadyStateDiffEq", "SIMIlluminationPatterns", "Clapeyron", "ConstrainedSystems", "CoordGridTransforms", "SMLMFrameConnection", "CTDirect", "DiffEqPhysics", "PhysicalMeshes", "PowerWaterModels", "Pigeons", "MedImages", "QuantumCollocation", "RigorousCoupledWaveAnalysis", "ParticleInCell", "TinyGibbs", "MPIMeasurements", "PsychometricsBazaarBase", "BoltzmannMachinesPlots", "CausalityTools", "Empirikos", "PsychometricsBazzarBase", "TransitionsInTimeseries", "FMIZoo", "GeophysicalFlows", "BioLab", "KDEstimation", "PsychomotorVigilanceTask", "TimeseriesTools", "PhysicalFFT", "ThinFilmsTools", "Batsrus", "SOLPS2imas", "MRISimulation", "BOSS", "MCMCChains", "PointCloudRasterizers", "NestedSamplers", "DeconvOptim", "Associations", "MCMCTempering", "JustRelax", "BayesQR", "KomaMRIPlots", "FittedItemBanks", "QRDecoders", "GrayCoding", "PlasmaEquilibriumToolkit", "MIRT", "PhysicalFDM", "ProjectManagement", "OmniParserIconDetectors", "SetIntersectionProjection", "WorldOceanAtlasTools", "Sunny", "MCMCChainsStorage", "OpenQuantumTools", "DeepFry", "PiecewiseDeterministicMarkovProcesses", "DifferentialEvolutionMCMC", "MembraneEOS", "SpmGrids", "AstroPropagators", "OrdinaryDiffEqBDF", "SPHtoGrid", "OpticalFibers", "Crispulator", "ProteinEnsembles", "DynACof", "InstrumentOperator", "MPISphericalHarmonics", "YAXArrays", "SpecialExponentialFamilies", "DifferentiableCollisions", "HarmonicPowerModels", "QuantitativeSusceptibilityMappingTGV", "MeshCatMechanisms", "PSSFSS", "ROMEO", "TinnitusReconstructor", "DrillHoles", "DiffEqUncertainty", "ComputerAdaptiveTesting", "JuliaBUGS", "BioFindr", "SimSpin", "DistributedStwdLDA", "VIDA", "ClimateBase", "NonstationaryProcessesBase", "Photometry", "NonstationaryProcesses", "PotentialFlow", "BLAKJac", "GlobalSensitivity", "Jags", "GridPotentialFlow", "ComputationalHeatTransfer", "Extremes", "VortexStepMethod", "BEASTDataPrep", "BAT", "ImmersedLayers", "NetworkHawkesProcesses", "PawsomeTracker", "EnsembleKalmanProcesses", "SorptionModels", "Mimi", "MendelPlots", "MimiFAIRv1_6_2", "IDFCurves", "Mimi_NAS_pH", "MimiMooreEtAlAgricultureImpacts", "EFTfitter", "MimiSNEASY", "ViscousFlow", "RIrtWrappers", "CameraCalibrations", "UncertainData", "MimiPAGE2020", "MatterPower", "ReefModEngine", "MimiSSPs", "UVITTools", "DynamicMovementPrimitives", "DelayDiffEq", "AlgamesDriving", "ApproxMasterEqs", "MimiRFFSPs", "SensitivityRankCondition", "SciMLExpectations", "FNCFunctions", "StellaratorOptimizationMetrics", "ILMPostProcessing", "RandomFeatures", "GCIdentifier", "DeviceLayout", "GradientRobustMultiPhysics", "BloqadeDormandPrince", "Ditherings", "Regions", "Eikonal", "Images", "NBodySimulator", "RvSpectML", "MimiCIAM", "HoloProcessing", "ODEHybrid", "VMEC", "GigaScatter", "PlotPlants", "ImageTracking", "ColorSchemeTools", "ComplexPhasePortrait", "ArgoData", "GeneralizedSDistributions", "ScatteringOptics", "ImageFeatures", "EarthDataLab", "ObjectDetector", "SpaSM", "ImageUtils", "EMpht", "AbstractPlotting", "NEOs", "BLASBenchmarksCPU", "StatisticalRethinking", "AvrutinSearch", "ExtendableFEM", "Unfold", "SafetySignalDetection", "UnfoldStats", "ElectroPhysiology", "Trebuchet", "Fronts", "LongwaveModePropagator", "RayTraceHeatTransfer", "DynamicalSystems", "UnderwaterAcoustics", "Luna", "VirtualAcousticOcean", "AuditoryStimuli", "ClimateTools", "OceanRobots", "OtsuThresholding", "Tapestree", "AcousticsToolbox", "PixelArt", "Taproots", "StarFormationHistories", "AIBECS", "WebToys", "MultiScaleArrays", "MintsMakieRecipes", "GpABC", "DiffusionGarnet", "SimpleDistributionPowerFlow", "ImageColorThresholderApp", "Cropbox", "MakieRichText", "SwarmMakie", "CatmullClark", "InteractiveViz", "TernaryDiagrams", "ClimatePlots", "SeisMakie", "GeometryPrimitives", "Escher", "ActionModels", "StatsPlots", "SimpleCrop", "KomaMRI", "Foresight", "CirculatorySystemModels", "Biplots", "LWFBrook90", "MakieDraw", "LeafGasExchange", "MaxwellBase", "PlotShapefiles", "ITensorNumericalAnalysis", "ProbabilisticEchoInversion", "Sherlock", "CellMLToolkit", "RPRMakie", "StochasticDelayDiffEq", "GasChem", "CornerPlotting", "RFFMakie", "HierarchicalGaussianFiltering", "GraphMakie", "Makie", "ItemResponsePlots", "MaxwellFDFD", "UnfoldBIDS", "AbstractGPsMakie", "FisherPlot", "BloqadeWaveforms", "FastGaussianProcesses", "PlantViz", "GeophysicalModelGenerator", "PairPlots", "Garlic", "MakiePublication", "OpenQuantumSystems", "NestedGraphMakie", "FerriteViz", "InPartSMakie", "OptimizedEinsum", "VlasiatorMakie", "PlantGraphs", "BloqadeKrylov", "PeriodicMatrices", "MakieTeX", "GeoMakie", "PlantGeomTurtle", "ElectrochemicalKinetics", "MaxwellGuidedMode", "SpectralStatistics", "DataDrivenSparse", "DataDrivenDMD", "LowRankIntegrators", "ClapeyronHANNA", "CitableImage", "MapMakie", "AeroBeams", "FrequencySweep", "InformationInequalities", "CitablePhysicalText", "EigenShow", "TestParticleMakie", "VectorModesolver", "RealTimeAudioDiffEq", "AcousticRayTracers", "TopoPlots", "InverseStatMech", "Drifters", "MetaAnalysis", "ThesisArt", "HarmonicPowerFlow", "IndividualDisplacements", "SurfaceCoverage", "RankCompV3", "VisClaw", "BatchReactor", "SignedDistanceField", "Supernovae", "MTKHelpers", "IsoME", "AoGExtensions", "HighFrequencyCovariance", "MethodOfLines", "Chron", "VirtualPlantLab", "SSMPlots", "Lighthouse", "JointSurvivalModels", "ModiaLang", "SubsidenceChron", "SignalTablesInterface_CairoMakie", "Repotomata", "WellPlates", "TrajectoryGamesExamples", "MeshViz", "SymbolicInference", "TrajectoryGamesBase", "Survey", "CalibrateEmulateSample", "EditorsRepo", "AgentsPlots", "ModiaPlot_CairoMakie", "ClimaCoreMakie", "ObservableCortex", "Tortuosity", "MCHammer", "GenomicDiversity", "ModiaPlot_GLMakie", "MRIgeneralizedBloch", "CollectiveSpins", "Jchemo", "ReactionDiffusion", "MCPhylo", "PhysicsInformedRegression", "SignalTablesInterface_WGLMakie", "RainMaker", "ControlBarrierFunctions", "MESTI", "GridapMakie", "UnfoldDecode", "SimulatedNeuralMoments", "MimiBRICK", "IMAS", "FusionSyntheticDiagnostics", "GeoIO", "SPEDAS", "DataDrivenEnzymeRateEqs", "HeartRateVariability", "SphericalClusterMass", "Tyler", "ITensorMakie", "SOLPS2ctrl", "MimiGIVE", "StatisticalRethinkingPlots", "CropRootBox", "SimulationBasedCalibration", "TensegrityEquilibria", "TidierPlots", "InferenceReport", "Coalescent", "QuantumDynamics", "Biofilm", "MRINavigator", "MathepiaModels", "BaryPlots", "HmtGutenberg", "BloqadeODE", "AlgorithmicCompetition", "UnfoldMakie", "BloqadeGates", "CarnotCycles", "MiseEnPage", "BloqadeNoisy", "Tidier", "Controlz", "MagNav"])

nanosoldier · 2025-05-19T12:45:20Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

✖ Packages that failed

9 packages failed only on the current version.

Package fails to precompile: 1 packages
Package has test failures: 1 packages
Test duration exceeded the time limit: 7 packages

56 packages failed on the previous version too.

✔ Packages that passed tests

6 packages passed tests only on the current version.

Other: 6 packages

273 packages passed tests on the previous version too.

~ Packages that at least loaded

2 packages successfully loaded only on the current version.

Other: 2 packages

185 packages successfully loaded on the previous version too.

➖ Packages that were skipped altogether

1 packages were skipped on the previous version too.

johnnychen94 · 2025-05-20T01:27:42Z

PkgEval and CI tests look good, I'll keep this PR as it is until feedback comes.

mbauman · 2025-05-20T18:25:55Z

This kinda scares me with respect to TTFX and over specialization and static compilability — especially for such a fundamental syntax. I don't think I can meaningfully review it beyond spreading this FUD; maybe someone closer to static compilation could evaluate its impacts?

johnnychen94 · 2025-05-21T02:41:12Z

TTFX is a good point, here's the small experiment I've done for 2x2 hvcat:

@generated function many_cats_in_one_function(::Val{N}) where N
    exprs = Expr[]
    push!(exprs, :(x = [1 2; 3 4]))
    for i in 1:N
        push!(exprs, :(x = [(x[1, 1] + 1) (x[2, 2] + 1); (x[1, 2] + 1) (x[2, 1] + 1)]))
    end
    return Expr(:block, exprs...)
end

function many_cats_recursive(n)
    if n == 0
        return [1 2; 3 4]
    else
        x = many_cats_recursive(n - 1)
        return [(x[1, 1] + 1) (x[2, 2] + 1); (x[1, 2] + 1) (x[2, 1] + 1)]
    end
end

@info "Testing many_cats_recursive"
@time "1" @eval many_cats_recursive(1) # only the first run is the real TTFX
@time "5" @eval many_cats_recursive(5) # all subsequent calls are real runtime
@time "10" @eval many_cats_recursive(10)
@time "20" @eval many_cats_recursive(20)
@time "50" @eval many_cats_recursive(50)
@time "100" @eval many_cats_recursive(100)
@time "200" @eval many_cats_recursive(200)
@time "1000" @eval many_cats_recursive(1000)

@info "Testing many_cats_in_one_function"
@time "1" @eval many_cats_in_one_function(Val{1}())
@time "5" @eval many_cats_in_one_function(Val{5}())
@time "10" @eval many_cats_in_one_function(Val{10}())
@time "20" @eval many_cats_in_one_function(Val{20}())
@time "50" @eval many_cats_in_one_function(Val{50}())
@time "100" @eval many_cats_in_one_function(Val{100}())
@time "200" @eval many_cats_in_one_function(Val{200}())
@time "1000" @eval many_cats_in_one_function(Val{1000}())

Based on the data, we get:

Only when we have over 50 [x x; y y] assignments in one single function body, will we see a difference in terms of ttfx. But this is a relatively rare and extreme use case.
when we have only a few [x x; y y] assignments, the ttfx differences are almost negligible

# of Cats	1.13-dev Recursive	1.13-dev One Function	1.12.0 Recursive	1.12.0 One Function
1	0.011366	0.018347	0.024991	0.020136
2	0.000125	0.011275	0.000126	0.010211
3	0.000128	0.097539	0.000271	0.014777
4	0.000133	0.015556	0.000132	0.016847
5	0.000165	0.019281	0.000125	0.018541
10	0.000128	0.031956	0.000136	0.028788
20	0.000114	0.065612	0.000160	0.054726
50	0.000117	0.155401	0.000128	0.149340
100	0.000120	0.406060	0.000128	0.289541
200	0.000126	0.992100	0.000132	0.710165
1000	0.000160	17.622069	0.000191	4.349432

The above experiment is about 2x2 hvcat where the manual unroll is applied.

The following is about 6x6 hvcat where we call the generic hvcat_fill! normal function.
It suggests that ttfx is getting better. (The slowdown for the recursive version is because of #58422 (comment))

# of Cats	1.13-dev Recursive	1.13-dev One Function	1.12.0 Recursive	1.12.0 One Function
1	0.060155	0.025877	0.156673	0.030071
2	0.000133	0.109858	0.000129	0.040648
3	0.000134	0.038697	0.000121	0.052605
4	0.000220	0.049555	0.000127	0.068721
5	0.000144	0.057182	0.000130	0.094925
10	0.000146	0.097877	0.000130	0.195860
20	0.000139	0.220829	0.000129	0.302984
50	0.000170	0.538179	0.000137	0.731944
100	0.000212	1.062964	0.000133	1.599463
200	0.000270	2.127295	0.000170	3.146528
1000	0.000966	14.939002	0.000476	21.660305

johnnychen94 · 2025-05-21T03:44:48Z

src/julia-syntax.scm

@@ -2276,11 +2276,13 @@
               (if (any (lambda (row) (any vararg? row)) rows)
                   `(call ,@hvcat_rows ,@(map (lambda (x) `(tuple ,@x)) rows))
                   `(call ,@hvcat
-                          (tuple ,@(map length rows))
+                          (new (curly (top Val) (call (core tuple) ,@(map length rows))))


I get some inconsistency checking the result of @which

julia> @which [2 2; 3 3] hvcat(rows::Tuple{Vararg{Int64}}, xs::T...) where T<:Number @ Base abstractarray.jl:2203 julia> Meta.@lower [2 2; 3 3] :($(Expr(:thunk, CodeInfo( 1 ─ %1 = builtin Core.tuple(2, 2) │ %2 = builtin Core.apply_type(Base.Val, %1) │ %3 = %new(%2) │ %4 = dynamic Base.hvcat_static(%3, 2, 2, 3, 3) └── return %4 ))))

is it because I need to update JuliaSyntax as well?

adienes · 2025-05-21T13:59:30Z

is this related to / does this address #43844 ?

mcabbott · 2025-05-21T16:20:40Z

Can you explain a bit where this matters? Do you have some examples of performance-sensitive code which allocates small matrices from numbers?

In my head at least, hvcat / hvncat are great ways to make small matrices in examples / tests. And for REPL / Test, we should care a lot about TTFX and little about throughput.

That said I am surprised how widely the benchmarks above vary between versions.

oscardssmith · 2025-05-21T18:26:40Z

One place where this would have been nice is SciML/OrdinaryDiffEq.jl#2712. Constructing a matrix of literal values seems like the sort of thing that should be basically free, so it's somewhat unfortunate when it's expensive.

johnnychen94 · 2025-05-22T10:22:50Z

Can you explain a bit where this matters?

The following is a typical pattern when one wants to solve a problem iteratively (generated from ChatGPT):

version	time with static arrays (s)	time with normal arrays (s)
1.9	2.42	21.19
1.10	2.68	43.82
1.12	2.35	27.49
1.13-dev	2.53	16.2

We identified issue #58397 when switching from Julia 1.9 to Julia 1.10 for our internal project, written by people who only know MATLAB and barely know Julia.
In their world, there is only one array type.

In our case, we fixed it by teaching them how to use StaticArrays, but it's often the case that only power users of Julia know about StaticArrays and when to use them...

And I won't be surprised if there are cases in the wild where array sizes are not deterministic like this one.

using LinearAlgebra

function lorenz_matrix(u)
    a, b = u[1,1], u[1,2]
    c, d = u[2,1], u[2,2]

    da = 10.0*(b - a)
    db = a*(28.0 - d) - b - 0.1*b*d
    dc = c*(b - c) - 0.1*c^2
    dd = a*c - (8/3)*d - 0.1*d^2

    return [da db; dc dd]
end

function euler_integrate(f, u0, tspan; dt=0.01, max_dt=0.1, min_dt=1e-6)
    t0, tf = tspan
    t = t0
    u = copy(u0)

    history = [(t, copy(u))]

    while t < tf
        # Compute derivative
        du = f(u)

        # Adaptive time stepping
        scale = norm(du) + 1e-10  # Prevent division by zero
        curr_dt = min(max_dt, max(min_dt, 0.1/scale))
        curr_dt = min(curr_dt, tf - t)  # Don't overshoot final time

        # Euler step
        u += curr_dt * du
        t += curr_dt

        push!(history, (t, copy(u)))
    end

    return history
end

# Initial condition
u0 = [1.0  0.01;
      0.01 0.0]

# Integrate
tspan = (0.0, 50.0)
history = @time euler_integrate(lorenz_matrix, u0, tspan)

johnnychen94 · 2025-05-22T10:30:18Z

is this related to / does this address #43844 ?

@adienes Kind of, except there isn't much room for optimizing 1D case [a, b, c] or [a; b; c]. The performances between the array initialization method and the concatenation methods for 1D are almost the same.

This PR intends to fix the performance gap for 2D and higher-dimensional cases. And applies to Number type only.

BioTurboNick · 2025-05-27T14:30:13Z

I don't have a strong opinion either way, but two questions that may guide decisions:

How does the time compare with GC? If you're at the point where small arrays are a major bottleneck, is eliminating the repeated array allocation usually better than the gain from this?

What are the trade-offs vs. a macro that would just expand the concat syntax into an elementwise fill?

johnnychen94 · 2025-05-27T17:03:48Z

How does the time compare with GC? If you're at the point where small arrays are a major bottleneck, is eliminating the repeated array allocation usually better than the gain from this?

What are the trade-offs vs. a macro that would just expand the concat syntax into an elementwise fill?

@BioTurboNick I'm very much aware that there are many work arounds to this performance issue. Reuse the created memory, manually rewrite or via macros, or adopt StaticArrays. They are all doable in a real project.

The real reason I chose to provide a fix here at the language level is that I find it extremely awkward to explain this as a performance tip.
Imagine that you're writing down a performance tip wrt this:

## use []-syntax sugar only for convenience
 
Although relatively cheap, creating arrays using the `[]`-syntax as a syntax sugar introduces some extra overhead.
If performance is critical, it is recommended to create the array using the initialization pattern:

```julia
A = Array{Float64, 2}(undef, 2, 2)
A[1,1] = ...
...
```

or use `@unroll [x x; y y]`.

Consider this a numerical computing language; this performance tip is rather an excuse for something that we should fix, but we didn't.

So if I were to sort the priorities, ensuring our users have a reasonable and consistent performance expectation of []-syntax is more important than worrying about latency.
Not to mention that it is not a marginal, but instead a 10x performance difference here.

BioTurboNick · 2025-05-27T18:17:00Z

My question comes from my expectation that ever allocating an array on the heap in Julia is expensive and should be avoided in any performance-critical code.

johnnychen94 · 2025-05-27T19:07:19Z

With some quick chat with @timholy on Slack, let's summarize the PR:
this is a smart fix to the []-syntax performance gotcha, but the over-specialization might be out of control.
And this is the kind of thing we have no concrete expectation of its output, so more benchmarks are needed.

Here's what I plan to do next:

Do a more thorough benchmark to see if there are TTFX regressions in a real-world project.
Try to reduce the number of specializations: the specialization on rows is necessary if we want to reach a better performance. But the specializations on T<:Number eltype aren't.
try to fix add static size information for hvcat and hvncat #58422 (comment)
This at least breaks Zygote AD, a PR to Zygote/ChainRules is needed.

MWE that breaks Zygote AD

using Zygote

f(x, y) = [
    (x + y) (x - y);
    (x * y) (x / y)
]
g(x, y) = hvcat((2, 2), x + y, x - y, x * y, x / y)

@assert f(0.5, 1.5) == g(0.5, 1.5)


df(x, y) = gradient((_x, _y)->sum(f(_x, _y)), x, y)
dg(x, y) = gradient((_x, _y)->sum(g(_x, _y)), x, y)

df(0.5, 1.5) # errors with mutation

johnnychen94 · 2025-05-27T19:23:17Z

My question comes from my expectation that ever allocating an array on the heap in Julia is expensive and should be avoided in any performance-critical code.

@BioTurboNick This is very true, and we can even replace Array with Tuples (as what StaticArrays is under the hood).
However, having multiple workarounds to this doesn't necessarily mean the issue doesn't exist or people doesn't benefit from this.
There are a large number of naive Julia users using arrays without knowing all sorts of the memory things behind it.
Every MATLAB user is a potential Julia user :P

We just need to do the fix with less impact in a less controversial way.

timholy · 2025-05-28T08:12:05Z

the specialization on rows is necessary if we want to reach a better performance. But the specializations on T<:Number eltype aren't

When it comes to specializations, I'm actually more worried about the specialization on the number of rows. Is there really any reason to be confident that there won't be projects that use thousands or tens of thousands of different matrix sizes? That would be a lot of code to compile and carry around in your Julia session.

MasonProtter · 2025-05-28T13:09:04Z

But in order to hit this codepath, they'd have to literally write out (or generate via metaprogramming) code with tens of thousands of different matrix sizes written using literal hvcat syntax. You wouldn't hit this codepath if you were just doing rand(N, M) or whatever.

oscardssmith · 2025-05-28T13:15:12Z

The only way you could hit this programmatically is if you were doing something awful with hvncat(A...) or something like that.

KristofferC · 2025-05-28T13:22:35Z

Is there really a case where you have a concat syntax at a hot place and that is the performance bottle neck of your code? I'm just tying to figure out how such a code would look like. What would it do with that array?

MasonProtter · 2025-05-28T13:30:22Z

It's not currently ideal to allocate arrays in performance sensitive code locations, but it happens, and we should expect it to happen more in the future as we work on having escape analysis move non-escaping Memory allocations from the heap to the stack.

mcabbott · 2025-05-28T21:03:50Z

I suppose that someone will be include-ing files with large matrices printed out, instead of using CSV. Whether they get to thousands I don't know. Nor whether this would ever be the bottleneck.

To save on specialisations, the parser could switch to the dynamic method above (say) 16 rows. Or could perhaps always use the dynamic method in global scope? Both of these would add complexity though... and ways to have subtle bugs if you test one case & not the other.

oscardssmith · 2025-05-28T21:05:43Z

Should this be using https://docs.julialang.org/en/v1/manual/metaprogramming/#Optionally-generated-functions so that we don't bother generating where we don't need to?

johnnychen94 commented May 15, 2025

View reviewed changes

base/abstractarray.jl Outdated Show resolved Hide resolved

johnnychen94 marked this pull request as ready for review May 16, 2025 10:02

johnnychen94 requested review from mbauman and oscardssmith May 16, 2025 10:04

generated function with less dependencies

d0a521f

This commit removes the dependency of generated function to IdSet because it is not yet defined during the bootstrap stages. By doing this, we can avoid the world age issues and use @generated function in abstractarray.jl.

johnnychen94 force-pushed the jc/static_hvcat branch from 86d863c to 9ed559f Compare May 18, 2025 16:09

johnnychen94 force-pushed the jc/static_hvcat branch from 9ed559f to 299ce1e Compare May 18, 2025 16:13

johnnychen94 commented May 18, 2025

View reviewed changes

adienes added the status: waiting for PR reviewer label May 20, 2025

mbauman removed their request for review May 20, 2025 18:06

generate less codes for generic static cases

9c3d5bd

johnnychen94 commented May 21, 2025

View reviewed changes

nsajko added the arrays [a, r, r, a, y, s] label May 27, 2025

Uh oh!

add static size information for hvcat and hvncat #58422

Are you sure you want to change the base?

add static size information for hvcat and hvncat #58422

Conversation

johnnychen94 commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark performance

[x x; y y] — 2D Matrix Creation

[x x;;; y y] — Higher-Dimensional Array Creation

Uh oh!

Uh oh!

johnnychen94 commented May 16, 2025

Uh oh!

johnnychen94 commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nanosoldier commented May 17, 2025

❗ Packages that crashed

✖ Packages that failed

✔ Packages that passed tests

~ Packages that at least loaded

➖ Packages that were skipped altogether

Uh oh!

johnnychen94 May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnnychen94 commented May 18, 2025

Uh oh!

nanosoldier commented May 19, 2025

✖ Packages that failed

✔ Packages that passed tests

~ Packages that at least loaded

➖ Packages that were skipped altogether

Uh oh!

johnnychen94 commented May 20, 2025

Uh oh!

mbauman commented May 20, 2025

Uh oh!

johnnychen94 commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnychen94 May 21, 2025

Choose a reason for hiding this comment

Uh oh!

adienes commented May 21, 2025

Uh oh!

mcabbott commented May 21, 2025

Uh oh!

oscardssmith commented May 21, 2025

Uh oh!

johnnychen94 commented May 22, 2025

Uh oh!

johnnychen94 commented May 22, 2025

Uh oh!

BioTurboNick commented May 27, 2025

Uh oh!

johnnychen94 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BioTurboNick commented May 27, 2025

Uh oh!

johnnychen94 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnychen94 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timholy commented May 28, 2025

Uh oh!

MasonProtter commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscardssmith commented May 28, 2025

Uh oh!

KristofferC commented May 28, 2025

Uh oh!

MasonProtter commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

johnnychen94 commented May 15, 2025 •

edited

Loading

`[x x; y y]` — 2D Matrix Creation

`[x x;;; y y]` — Higher-Dimensional Array Creation

johnnychen94 commented May 16, 2025 •

edited

Loading

johnnychen94 May 18, 2025 •

edited

Loading

johnnychen94 commented May 21, 2025 •

edited

Loading

johnnychen94 commented May 27, 2025 •

edited

Loading

johnnychen94 commented May 27, 2025 •

edited

Loading

johnnychen94 commented May 27, 2025 •

edited

Loading

MasonProtter commented May 28, 2025 •

edited

Loading

MasonProtter commented May 28, 2025 •

edited

Loading