-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: persistent lists #11767
base: master
Are you sure you want to change the base?
feat: persistent lists #11767
Conversation
Memory management bugs are tricky to debug, because their effects tend to be discovered only later on, and in rather uninformative ways. The warnings in https://sinusoid.es/immer/memory.html#garbage-collected-heap hold true, but I wouldn't be surprised if another property is involved in our crash here. Does Does this work if GC is disabled? If so, we might be dealing with a missing GC root. If it also fails with GC disabled, we might be looking at some sort of memory corruption, such as the lack of initialization you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things that jumped out browsing through the code (not a review)
@@ -133,34 +146,6 @@ class ExternalValueBase | |||
std::ostream & operator << (std::ostream & str, const ExternalValueBase & v); | |||
|
|||
|
|||
class ListBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to keep this around for the existing C API.
It should be easy to implement with immer
types, I suppose.
Whether we continue to use ListBuilder
internally is another question, and I could see "no" as a likely answer to it.
tList1, | ||
tList2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two are an optimization that lets the evaluator produce small lists without any allocations besides e.g. the thunk that already existed. I think we'll want to keep tList1
and tList2
for this purpose.
// declare a memory policy for using a tracing garbage collector | ||
using gc_policy = immer::memory_policy<immer::heap_policy<immer::gc_heap>, | ||
immer::no_refcount_policy, | ||
immer::default_lock_policy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the default lock policy does.
I suppose clever locking might save allocations in a few cases, but I'm inclined to think that just allocating all the time is more efficient, because we tend to keep the "old version" around because purely functional operations are non-destructive and our scopes/closures (Env
) tend to stick around fairly often in practice, and if they don't, we don't really know locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The locking is not used for allocation actually, but it's used by immer::atom
so no need to worry about it :)
Thanks for doing this experiment. Maybe to see if this goes in the right direction you could try to do some micro benchmark before having to fix all builtins etc? Than we could already get a performance preview. |
also cc @arximboldi |
I can eval my NixOS config! I tried re-adding uses of |
0251ac3
to
2ee7758
Compare
Rebased to fix merge conflicts. |
No reading list unfortunately; except perhaps for the bdwgc docs, which are mostly in its headers. It's reference documentation, not really learning material. The symptoms you describe suggest that something is not getting marked and then collected by accident, or in other words, some heap objects aren't reachable from the garbage collection roots such as the (main) stack or C++ heap objects allocated with |
Adding |
@Mic92 I added stats to the numbers section. @roberth I know that with Haskell, one can use allocation stats provided by GHC's RTS as a weak proxy for performance -- any idea if that's the case with Nix as well? My understanding was that a non-trivial amount of time is spent in the garbage collector. |
@arximboldi I'm having trouble with transient data structures -- do you have any insight? In void ExprList::eval(EvalState & state, Env & env, Value & v)
{
auto list = state.allocList();
for (auto & i : elems)
*list = list->push_back(i->maybeThunk(state, env));
v.mkList(list);
} I've tried to rewrite it to use void ExprList::eval(EvalState & state, Env & env, Value & v)
{
auto list = state.allocList();
auto transient = list->transient();
for (auto & i : elems) {
transient.push_back(i->maybeThunk(state, env));
}
*list = transient.persistent();
v.mkList(list);
} Unfortunately, when I do that I get an exception when calling nix: /nix/store/gkghbi5d6849mwsbcdhnqljz2xnjvnis-immer-0.8.1-unstable-2024-09-18/include/immer/transience/gc_transience_policy.hpp:95: ownee &immer::gc_transience_policy::apply<immer::heap_policy<immer::gc_heap>>::type::ownee::operator=(edit) [HeapPolicy = immer::heap_policy<immer::gc_heap>]: Assertion `e != noone' failed. I believe something is being garbage collected when it shouldn't be, and verified this by checking that running with I know that in typedef std::vector<Value *, traceable_allocator<Value *>> ValueVector;
typedef std::unordered_map<Symbol, Value *, std::hash<Symbol>, std::equal_to<Symbol>, traceable_allocator<std::pair<const Symbol, Value *>>> ValueMap;
typedef std::map<Symbol, ValueVector, std::less<Symbol>, traceable_allocator<std::pair<const Symbol, ValueVector>>> ValueVectorMap; Do I need to do something similar for Do you have any ideas on what I can do to troubleshoot or debug this? From the error, I'm not entirely sure if it's the |
To my understanding, this is a consequence of widespread use of immutable data structures.
Recent versions of Nix will tell you how much time is spent in GC. Evaluating my NixOS config it spends 10% of time in GC, but an individual package might only spend 1.5% in GC. (anecdata) |
So currently we are actually increasing memory usage? I wonder what happens if we remove the optimization from nix for 1,2 element lists from the current implementation. Maybe the comparison is than fairer? |
Arrays are pretty compact, fwiw. Maybe this optimization would pay off if we had constant folding? IIRC we haven't previously done constant folding because it didn't pay off, but perhaps that was because of all the copying. Another possibility is that the data structure can't be used efficiently because it has to assume that multiple references to it exist, even though those references might be all from the garbage. |
Hi @ConnorBaker ! Thanks a lot for this contribution. I am a fan of Nix and use it in all my systems, so this gets me very excited! 🎉 😄 Taking a look at the code, the two things I can see could improve:
As to why are those crashes happening... you mention that the values are being inappropriately GC'ed. You are using the |
Ok, just took a look and it seems we are indeed using Boehm's GC here. And the the implementation of I see instances of something that resembles refcounting in the code though (`nix_gc_incref, etc.). That could be integrated via the memory policy. I think it'd be nice for someone with a bit more knowledge about how the lifetimes of C++ values is managed in Nix and how it all interacts with the GC. It's unclear to me: which allocations are should be made with Boehm's GC Also, what's |
Looking at it again... I wouldn't fully discard that there may be a bug in Immer's use of |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/is-it-possible-for-us-to-remove-builtins-functionargs/51960/4 |
d7f712f
to
f742092
Compare
I’ve stalled a bit here in terms of will to make progress. I’ve been thinking about implementations for the data structures we use for lists and attribute sets, and though this is naive, I would think the best implementation would make the most common operations the cheapest to perform. Seeing as how lists are implemented as arrays of pointers to values instead of values themselves, I don’t know that the overhead involved in persistent data structures is worth what we can get back in sharing. Having lazy evaluation, I don’t know that it would make sense to have lists contain actual values rather than pointers to them — for most of our usage of the language, would cache locality matter? Attribute sets are vectors of Attr, but I remember that at some point they were two arrays of values — one for names, and one for values. Given that outside attribute set merging common patterns of use involve iterating over the names or values of attribute sets, would it make sense for the implementation to make such operations cheap? As I understand it, getting the names or values involved iterating over the vector of Attr and storing the desired components in a new collection. Not related to lists or attribute sets — strings! I understand the SymbolTable is doing something like string interning? Is there a better data structure we could use rather than a vector? @tomberek in #9034 you mentioned using the symbol table for all strings; are you aware of any existing library we could use for more efficient string implementations or operations, or should we look into implementing our own? |
Important
This PR is a work in progress. I may force-push.
Experimental work on switching to persistent lists via immer (thanks for the library suggestion @Mic92!).
Current blockers:
Various places use collections fromstd
without configuring them to use the GC.A printing test fails (see TODOs insrc/libexpr/print-ambiguous.cc
), but only when run throughnix-instantiate --eval
--nix eval
returns the correct result. That's odd and makes me think the failing test is symptomatic of a larger problem I introduced.printAmbiguous
I'm unable to evaluate my own system closure because an uninitialized value sneaks in. To help debug this, I've aprintError
toValue::type
(seesrc/libexpr/value.hh
), which is what was throwing the error. I've also added an assert toEvalState::forceValue
(seesrc/libexpr/eval-inline.hh
) to ensure an error is thrown if an uninitialized value is forced.transient
ValueList
through thenew
keyword.ValueList::push_back
usesstd::move
on the argument; I don't know enough CPP to know what that means if the argument is used afterwards.NIX_SHOW_STATS
no longer shows helpful things for lists besides concatenations(++)
operand is no longer usingconcatLists
, it's not updating the number of list concatenations tracked byEvalState
.Motivation
As far as I can tell, lists in Nix are structures with raw C arrays and a size. Most list operations involve allocating an array large enough to hold the result and then
memcpy
-ing components over.This PR looks at using persistent lists via immer -- my hope is that by leveraging techniques from function programming, we can reduce the amount of allocation and copying we do and improve performance of the interpreter.
Context
TODO
Early numbers
Setup
Run on a NixOS system with an Intel i9-13900K.
The current implementation is about 1.13x faster than the one in this PR. Put another way, the changes in this PR slow down eval by around 12% (using system evaluation times,
3.339-2.924)/3.339=0.124
).NOTE: I verified outputs from both versions are the same.
System evaluation
./nix-before/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel
./nix-after/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel
List-concatenation-heavy evaluation
./nix-before/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names
./nix-after/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names
$ NIX_SHOW_STATS_PATH="$PWD/list-concat-eval-bench-before.json" NIX_SHOW_STATS=1 ./nix-before/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names &> /dev/null
$ NIX_SHOW_STATS_PATH="$PWD/list-concat-eval-bench-after.json" NIX_SHOW_STATS=1 ./nix-after/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names &> /dev/null
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.