Skip to content

Releases: shareup/shllm

v0.5.0

29 Jul 20:16
0811eb0
Compare
Choose a tag to compare
  • Add support for downloading and inferencing with Gemma 3 4B 3-bit quantization model
  • Simplify LLM static accessors by removing duplicate most duplicate accessors (e.g., gemma2_2B and gemma2_9B are combined into gemma2)

v0.4.3

23 Jul 22:39
0a80595
Compare
Choose a tag to compare
  • Add SHLLM.cacheLimit, SHLLM.memoryLimit, and SHLLM.recommendedMaxWorkingSetSize
  • Remove processing: UserInput.Processing argument from Gemma 3 initializers

v0.4.2

23 Jul 21:30
cc96f0c
Compare
Choose a tag to compare
  • Update mlx-swift-examples
  • Add processing: UserInput.Processing? argument to LLM.init()

v0.4.1

02 Jul 20:41
a9194de
Compare
Choose a tag to compare
  • Re-export LanguageModel, LLMModel, and VLMModel

v0.4.0

01 Jul 01:59
c7a43f7
Compare
Choose a tag to compare
  • Add Gemma 3 1B (text-only) and 4B, 12B, and 27B (vision models)
  • Add support for new tool-calling mechanism
  • Fix crash when models exceed output token count

v0.3.0

22 Jun 22:07
7a32678
Compare
Choose a tag to compare
  • Update to newest release of mlx-swift-examples
  • Use new, simplified MLX method of loading and initializing models
  • Improve algorithm for limiting maximum number of input tokens

v0.2.0

04 May 23:34
98f6549
Compare
Choose a tag to compare

v0.1.0

06 Apr 20:02
d0b807e
Compare
Choose a tag to compare
  • Convert LLM into an AsyncSequence instead of an actor.
let input: UserInput = .init(messages: [
  ["role": "system", "content": "You are a helpful assistant."],
  ["role": "user", "content": "What is the meaning of life?"],
])

guard let llm = try qwen2_5__7B(input) else { return }
for try await reply in llm {
  print(reply)
}

v0.0.6

06 Mar 15:18
a762d47
Compare
Choose a tag to compare
  • Add LLM.clearCache()

v0.0.5

03 Mar 12:33
4e8dbf2
Compare
Choose a tag to compare
  • Export Message and UserInput types.