Skip to content

Commit 8c4b8ff

Browse files
committed
Fix AnnexScalaInstance jar cache misses due to ordering and path comparison problems
The jar cache in AnnexScalaInstance is a ConcurrentHashMap that uses a string representation of the compiler classpath (a list of jars) as its key. Problem is that list of jars which is used to create the key is not sorted, so you would sometimes get cache misses for the same classpath because the list had a different ordering. At a large enough scale, this would cause the JVM to run out of CodeHeap and crash. Presumably this was because we'd create up to n! AnnexScalaInstances for every AnnexScalaInstance and then classload for every single one of those instances. This change prevents those cache misses and thus prevents the worker from crashing. This should also fix cache misses with path mapping by comparing the short path, so we don't include the bazel arch/config specific parts of the path.
1 parent 6919531 commit 8c4b8ff

File tree

2 files changed

+40
-8
lines changed

2 files changed

+40
-8
lines changed

src/main/scala/higherkindness/rules_scala/workers/common/AnnexScalaInstance.scala

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ import java.net.URLClassLoader
77
import java.nio.file.{Files, Path, Paths}
88
import java.util.Properties
99
import java.util.concurrent.ConcurrentHashMap
10-
import scala.collection.immutable.Map
10+
import scala.collection.immutable.TreeMap
1111

1212
object AnnexScalaInstance {
1313
// See the comment on getAnnexScalaInstance as to why this is necessary
@@ -48,7 +48,7 @@ object AnnexScalaInstance {
4848
* from a prior compilation in combination with a new ScalaInstance for the current compilation. Or there's some issue
4949
* with Scala's classpath caching.
5050
*
51-
* If all three caches are enabled (this cache + Zinc claspath + Scala compiler), then the non-determinism seems go
51+
* If all three caches are enabled (this cache + Zinc claspath + Scala compiler), then the non-determinism seems to go
5252
* away.
5353
*
5454
* If this cache is not used, but the Zinc classpath + Scala compiler caches are used, then non-determinsm shows up in
@@ -67,11 +67,20 @@ object AnnexScalaInstance {
6767
* any different from leaving Zinc's cache enabled.
6868
*/
6969
private def getAnnexScalaInstance(allJars: Array[File], workDir: Path): AnnexScalaInstance = {
70-
// We need to remove the sandbox prefix from the paths in order to compare them.
71-
val mapBuilder = Map.newBuilder[Path, Path]
70+
// We want to compare short paths to avoid the Bazel sandbox prefix and arch/config specific parts of the path
71+
// We use a tree map because we want the keys sorted for comparison purposes
72+
val mapBuilder = TreeMap.newBuilder[Path, Path]
73+
val absoluteWorkDir = workDir.toAbsolutePath().normalize()
7274
allJars.foreach { jar =>
73-
val comparableJarPath = jar.toPath().toAbsolutePath().normalize()
74-
mapBuilder.addOne(jar.toPath -> workDir.toAbsolutePath().normalize().relativize(comparableJarPath))
75+
val absoluteJarPath = jar.toPath().toAbsolutePath().normalize()
76+
// Remove the sandbox prefix as well as the arch/config specific parts of the path.
77+
mapBuilder.addOne(
78+
jar.toPath ->
79+
FileUtil.bazelShortPath(
80+
absoluteWorkDir.relativize(absoluteJarPath),
81+
replaceExternal = false,
82+
),
83+
)
7584
}
7685
val workRequestJarToWorkerJar = mapBuilder.result()
7786

@@ -84,7 +93,7 @@ object AnnexScalaInstance {
8493
// but that would require hashing the all the classpath jars on every compilation request. I
8594
// imagine that would cause a performance hit.
8695
//
87-
// I also imagine it is extremeley rare to be mutating the contents of compiler classpath jars
96+
// I also imagine it is extremely rare to be mutating the contents of compiler classpath jars
8897
// while keeping the names the same, e.g., generating new scala library jar for scala 2.13.14.
8998
// As a result I'm leaving this string based for now.
9099
val key = workRequestJarToWorkerJar.values.mkString(":")

src/main/scala/higherkindness/rules_scala/workers/common/FileUtil.scala

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ package workers.common
44
import scala.annotation.tailrec
55
import java.io.{File, IOException}
66
import java.nio.channels.FileChannel
7-
import java.nio.file.{FileAlreadyExistsException, FileVisitResult, Files, Path, SimpleFileVisitor, StandardCopyOption, StandardOpenOption}
7+
import java.nio.file.{FileAlreadyExistsException, FileVisitResult, Files, Path, Paths, SimpleFileVisitor, StandardCopyOption, StandardOpenOption}
88
import java.nio.file.attribute.BasicFileAttributes
99
import java.util.zip.{ZipEntry, ZipInputStream, ZipOutputStream}
1010

@@ -59,6 +59,29 @@ object FileUtil {
5959
path.getFileName().toString().stripPrefix("header_").stripPrefix("processed_")
6060
}
6161

62+
/**
63+
* Given a Bazel path to the bazel output directory, return a path excluding the bazel configuration specific parts of
64+
* the path. This is analogous to Bazel's File.short_path function. Fair warning: this function is super good enough,
65+
* but is likely not perfect.
66+
*/
67+
def bazelShortPath(path: Path, replaceExternal: Boolean = true): Path = {
68+
val nameCount = path.getNameCount()
69+
val pathString = path.toAbsolutePath().normalize().toString()
70+
71+
val shortPath = if (path.startsWith("bazel-out") && nameCount >= 4) {
72+
path.subpath(3, nameCount)
73+
} else {
74+
path
75+
}
76+
77+
// Handle difference between Bazel's external directory being referred to as .. in the short_path
78+
if (replaceExternal && shortPath.startsWith("external")) {
79+
Paths.get(shortPath.toString().replaceFirst("external", ".."))
80+
} else {
81+
shortPath
82+
}
83+
}
84+
6285
def copy(source: Path, target: Path) = Files.walkFileTree(source, new CopyFileVisitor(source, target))
6386

6487
def delete(path: Path) = Files.walkFileTree(path, new DeleteFileVisitor)

0 commit comments

Comments
 (0)