Skip to content

Conversation

FriesischScott
Copy link
Contributor

As discussed in #234 the goal of this PR is to drop the separate balance tables for users and groups and instead compute the balances from all expenses in the database.

I agree with @krokosik that we need to be certain we can handle the required loads for large instances with lots of expenses but I think postgresql will be up for the task.

This is a list of the required balances:

  1. Friend balances
  2. Overall group balances for one user
  3. All balances for a specific group

here I've implemented the second and third and I think @krokosik has started working on the first.

Because prisma can't handle the necessary computations through it's javascript interface I've chosen to use typedSQL to write native SQL queries and still benifit from prisma's typing capabilities.

One minor downside is that typedSQL currently requires a live database connection to generate the types. However with prisma 7 the location of the generated client will move out of node_modules and we will have to set an output directory like this.

We could then commit the generated sql types in the sql subfolder of the prisma client and only generate the regular client during build. This should already be possible now (only in prisma 7 it's mandatory) and I'm working on setting it up.

@guenzd
Copy link

guenzd commented Jun 9, 2025

I get this error when I try to run with existing data:
grafik

@FriesischScott
Copy link
Contributor Author

I get this error when I try to run with existing data: grafik

Yes, I haven't finished fixing a few type issues. You should be able to run it with pnpm dev but build is not yet working.

@guenzd
Copy link

guenzd commented Jun 9, 2025

I get this error when I try to run with existing data: grafik

Yes, I haven't finished fixing a few type issues. You should be able to run it with pnpm dev but build is not yet working.

Then there are more runtime issues :/

grafik

@FriesischScott
Copy link
Contributor Author

I get this error when I try to run with existing data: grafik

Yes, I haven't finished fixing a few type issues. You should be able to run it with pnpm dev but build is not yet working.

Then there are more runtime issues :/
grafik

Can you try again now?

Turns out I wasn't up to date on the migrations and for some reason when the column is bigint the sum is returned as numeric. I'm now casting explicitly to bigint.

1000 users in 101 groups with 10k expenses each
Type Hash for faster lookup with =
@FriesischScott FriesischScott marked this pull request as ready for review June 18, 2025 12:29
@FriesischScott
Copy link
Contributor Author

FriesischScott commented Jun 18, 2025

I've updated the seed file to create 1000 users and 101 groups with 10k expenses each. The largest group has 30 members all the others have 10. It should give us a better understanding of the performance.

It does take quite a while to create all the expenses but it's a temporary change anyway. If we merge we can just revert the file completely.

@krokosik
Copy link
Collaborator

Fantastic work, thank you 🙌
I know I promised to take a swing at it this week, but unfortunately it's not likely to happen due to other obligations :/

@FriesischScott
Copy link
Contributor Author

Fantastic work, thank you 🙌 I know I promised to take a swing at it this week, but unfortunately it's not likely to happen due to other obligations :/

No problem at all. Take your time. Better not to rush things :)

Copy link
Collaborator

@krokosik krokosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry for taking so long with the review. I had to take care of other stuff as well as more pressing current issues. I requested some changes that would minimize this PR further as I believe such risky changes should be as easy to review as possible. I ran some initial tests on my machine and it is looking good, so I would like to proceed with this, but only after they have been resolved and the branch gets rebased onto main. Furthermore, if you could give me write access to your fork, I will implement friend balances as well :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original seed script is useful for local dev work, while this one takes a long time to execute. Please move it to a separate file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan was to only use this as and intermediate script and revert to the original before we merge. If you think it's worth keeping around I'll move it to a separate script.

Comment on lines +11 to +15
const sortByIds = (a: getAllBalancesForGroup.Result, b: getAllBalancesForGroup.Result) => {
if (a.paidBy === b.paidBy) {
return a.borrowedBy - b.borrowedBy;
}
return a.userId - b.userId;
return a.paidBy - b.paidBy;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep this PR as minimal as possible. As such, please keep the property names identical and create a type GroupBalance = getAllBalancesForGroup.Result alias

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, you want me to keep the existing firendId including the typo?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree the naming even without the typo is quite unfortunate, I would prefer to make such a critical PR as small and easy to review as possible. We can think about another PR afterwards.

Comment on lines 60 to 81
const groupBalances = await ctx.db.$queryRawTyped(getGroupsWithBalances(ctx.session.user.id));

const _groups = groupBalances
.map((b) => {
return {
id: b.id,
name: b.name,
};
})
.filter((obj, index, self) => index === self.findIndex((t) => t.id === obj.id));

const groupsWithBalances = _groups.map((group) => {
const balancesForGroup: Record<string, bigint> = {};
groupBalances
.filter((b) => {
return b.id == group.id;
})
.forEach((b) => {
if (b.currency != null && b.balance != null) {
balancesForGroup[b.currency] = b.balance;
}
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we remain on the dev branch, I propose the following sanity check:
utilize both the legacy and new method and compare their results. When a mismatch is detected, throw an Error. We will remove it before creating a release.

@krokosik
Copy link
Collaborator

krokosik commented Jul 7, 2025

Oh and one more thing, this change would break the current splitwise import functionality, as it does not import expenses as of now. #207 needs to be implemented first and as such I don't expect to go ahead with this PR for 1.5 release

@FriesischScott
Copy link
Contributor Author

Oh and one more thing, this change would break the current splitwise import functionality, as it does not import expenses as of now. #207 needs to be implemented first and as such I don't expect to go ahead with this PR for 1.5 release

No problem. More time to polish.

@alexanderwassbjer
Copy link
Contributor

alexanderwassbjer commented Aug 5, 2025

This is really good work! Calculating it based on the expenses rather then the balance table is amazing! 👏

@krokosik krokosik changed the title Compute balances in database WIP: Compute balances in database Sep 5, 2025
@krokosik krokosik marked this pull request as draft September 5, 2025 11:55
@krokosik krokosik mentioned this pull request Oct 7, 2025
@FriesischScott
Copy link
Contributor Author

Thanks for the ping. I'm of course happy to finish this.

I'll start by merging in all the changes and fixing the conflicts. Shouldn't take more than a day or two until I have something up to date and working.

@krokosik
Copy link
Collaborator

krokosik commented Oct 9, 2025

Okay nice, since I would like to include my changes as well in this PR and have some thoughts I would like to share, would you like to connect on Discord to chat more conveniently?

@FriesischScott
Copy link
Contributor Author

I've fixed the conflicts and properly formatted the SQL files. So far everything still seems to work as intended. The check currently fails because Github Actions doesn't allow to change the command for the postgres container to load the pg_cron extension.

The easiest fix would probably be if you could update the custom postgres image to contain the shared_preload_libraries.

@FriesischScott
Copy link
Contributor Author

Okay nice, since I would like to include my changes as well in this PR and have some thoughts I would like to share, would you like to connect on Discord to chat more conveniently?

Yeah, I think that would help. I'll email you my username.

@alexanderwassbjer
Copy link
Contributor

I've fixed the conflicts and properly formatted the SQL files. So far everything still seems to work as intended. The check currently fails because Github Actions doesn't allow to change the command for the postgres container to load the pg_cron extension.

The easiest fix would probably be if you could update the custom postgres image to contain the shared_preload_libraries.

Good work! Looking forward to this change.

@krokosik
Copy link
Collaborator

@FriesischScott I've sent you a friend request. I am now working on an extensive seeding script with fakerjs and some sociale network graph generation algorithms. It would be nice to have a deterministic dev db and we could establish some data consistency check on it and more importantly compare the existing balance tables with the query calculated ones.

@FriesischScott
Copy link
Contributor Author

@FriesischScott I've sent you a friend request. I am now working on an extensive seeding script with fakerjs and some sociale network graph generation algorithms. It would be nice to have a deterministic dev db and we could establish some data consistency check on it and more importantly compare the existing balance tables with the query calculated ones.

Good plan!

@krokosik
Copy link
Collaborator

The build error seems to stem from the fact you don't provide startup commands for cron. See the repo compose files for reference.

@krokosik
Copy link
Collaborator

@FriesischScott The seed script has been merged :)
It should serve as a replacement over your largeSeed file and is more nuanced with edited and deleted expenses.
You can tweak the number of users and groups to get more expenses and I recommend checking out the CONTRIBUTING.md. Experiment with the params by calling pnpm tsx src/dummies to get a SEED_STATISTICS.md report until you have some desirable data.

The next step would be to implement checks. Currently we still create balances, which also prevents us from inserting expenses in parallel as all the transactions create deadlocks. I would propose writing some consistency checks, that would compare as many cases of query calculated balances to the table ones as possible, preferably all of them.

Once that is done, we can plan next steps, as the situation with direct user-user balances requires us to think through the transition from current Total balances to user-user ones.

@krokosik krokosik mentioned this pull request Oct 19, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants