Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Adding mount options to functions. #320

Closed
alshabib opened this issue Oct 20, 2017 · 55 comments
Closed

Proposal: Adding mount options to functions. #320

alshabib opened this issue Oct 20, 2017 · 55 comments

Comments

@alshabib
Copy link

  • Brief summary including motivation/context

The proposed change would allow functions to mount volumes and other directories through the normal docker configuration. This would allow a function to process relatively large amounts of data without having to pass it through http/stdin.

Any design changes

Add docker mount struct to the CreateFunctionRequest struct and passing it along in the create function handler

Pros + Cons

Pros:

  • Functions will be able to have volumes.

Cons:

  • Volumes should be available on all nodes running functions (but that is rather expected)

Effort required
Little, it's a two line change.

alshabib added a commit to alshabib/faas that referenced this issue Oct 20, 2017
This allows functions to mouunt volumes and other
directories. It uses the same configuration as is
used by docker.

Fixes openfaas#320

Signed-off-by: Ali Al-Shabibi <[email protected]>
@rgee0
Copy link
Contributor

rgee0 commented Oct 21, 2017

Derek add label: proposal

@ericstoekl
Copy link
Contributor

Personally I think this is a fantastic idea. @alshabib , do you have any specific use-cases in mind that would necessitate this functionality? I'm thinking maybe a database (like MongoDB) as a function...?

@alshabib
Copy link
Author

alshabib commented Oct 25, 2017 via email

@alexellis
Copy link
Member

We might revisit this in the future but I think it is an anti-pattern for functions which are short-lived and stateless. This will encourage stateful behavior and assumptions.

@alshabib
Copy link
Author

I agree that this feature may encourage bad behavior, but then again you cannot stop people from shooting themselves in the foot.

Agree also that functions should be stateless and short-lived but that does not mean that they will not consume or emit large amounts of data and there is no reason why the volume of this data should be limited by the http session. This is simply an alternative method to providing input to a function.

Would you prefer an option to openfaas which would disable this feature rather than not providing it entirely?

@dexterg
Copy link

dexterg commented Dec 18, 2017

I am wrinting a ruby function that find an IP address in configuration files (firewall, proxy, bigip etc)
There is a night cron process (too long for faas) that download these files with (scp, expect etc.)

That's a good use case for a mount volume fonctionnality no ?

@Toggi3
Copy link

Toggi3 commented Jan 13, 2018

Feedback (take it or leave it): This is the only missing feature that prevented me from deploying this system for functions that handle batches of file pulls/pushes, translations, scraping. It's really an amazing framework, but I have to often deal with very large flows of handling files gathered from abc protocols due to xyz legal or contractual obligations. A serverless function system like this with the ability to have any kind of volume support would be pretty helpful. I really want full blown compose functionality with regards to volumes and network.

File access is way more reasonable for this kind of thing.

I hope you guys reconsider implementing such. I would be interested in what workarounds I can apply to achieve the result of bind mounting a specific fixed directory on all swarm workers in the cluster. I might just deploy with the related patch to solve my problem. Is there any other way to solve it? A service on the same network maybe?

@alexellis
Copy link
Member

alexellis commented Jan 13, 2018

Hi I'd like to know more about your usecase. Do you have any specifics?
Functions are not supposed to be stateful - it breaks the design, however I think you should try Minio object storage for your storage needs - makes sense to explore the recommended approach for the project/architecture.

@Toggi3
Copy link

Toggi3 commented Jan 14, 2018

So, I'll give you one such function, say you have to transcode/transcribe a proprietary audio format designed for storing call center interactions. You need to extract from it the audio payload, make it into something that can be understood by a voice transcription engine, as well as extract the other data and store it in a way that it can be digested for its analytics value, and place both of those things in a place where they can be picked up by another process later. I do a lot of batch information pulling/pushing/translating/feeding/scraping across many files for a lot of clients, but occasionally patterns emerge where I'd love to be able to develop a nice parallel function that I can feed variables and a list of files to do work like decrypting these multiple batches of 50000 audio files, dropped on some FTP site on storage we own and can export to our container host machines...

I'd rather not do such a thing through a layer like S3, I need something at least slightly faster, and NFS is simple.

@alshabib
Copy link
Author

alshabib commented Jan 14, 2018 via email

@Toggi3
Copy link

Toggi3 commented Jan 14, 2018

I ask that you give us just enough rope to hang ourselves if we so choose. We understand the spriit of your project, we just want a way out of its limitations that is easy for us to use.

@alshabib
Copy link
Author

alshabib commented Jan 14, 2018 via email

@alexellis
Copy link
Member

Using object storage is simple and fast. I'd encourage anyone on this thread to try our recommended approach before pushing back and insisting on volume mounting.

Minio / Ceph are both super easy to setup on Docker or Kubernetes:

https://minio.io

Once you have the daemon running you can access it in the cluster and create buckets / push/fetch objects via a client library.

Here's the Python library for instance:

https://docs.minio.io/docs/python-client-quickstart-guide

I'm planning on providing a small sample function but in the meantime there's our colorisebot which you can check out for inspiration.

@Toggi3
Copy link

Toggi3 commented Jan 24, 2018

It's an appealing solution, and I appreciate it for sure, the only problem with it is the preexisting infrastructure and scripts that depend on a volume being there. I will definitely take a look in any case, but I won't have time to refactor the many jobs to use s3 when they currently use volumes. Many of us are trying to use newer tools like this to simplify older systems, and while I might be able to sell my boss on the philosophy of why s3 might be better, there is simply too much work to do to scale such a mountain of technical debt just to appease a design preference.

Unfortunately, it appears fission isn't better in this regard. I might have to kludge something stupid together with Jenkins to kick off runs I guess... Any other input welcome as volumes are a must at least for the intermediate. Thank you for your work even if we couldn't come together on this problem. It's a good project.

@RawSanj
Copy link

RawSanj commented Jan 25, 2018

@alexellis So if I deploy Minio on K8S and have NFS as PersistentVoume for Minio and then store files in Minio from Functions, I will essentially be able to access NFS from the Functions.

Is that correct? And would it be a good idea to do this?

@alexellis
Copy link
Member

alexellis commented Jan 25, 2018

@raw I'm not sure you've read the new blog on using Minio & OpenFaaS?

@alexellis
Copy link
Member

@Toggi3 If you have a binary that reads a file from the filesystem, then copy the file with Minio's mc command into the right location, do your work then use mc cp to copy it where it belongs are that. From your description I can't see any hard dependency on volumes.

@RawSanj
Copy link

RawSanj commented Jan 25, 2018

@alexellis I'm sorry, which blog are you talking about? The Getting started OpenFaas on minikube?
Or is there any other blog for Volumes in OpenFaaS?

@alexellis
Copy link
Member

alexellis commented Jan 25, 2018

All - I've written a blog post on how to use object storage (S3) with OpenFaaS - just like you have to do with AWS Lambda, Azure Functions or similar. You have client libraries available for most programming languages including a binary client for bash. It's very easy to use and setup:

https://blog.alexellis.io/openfaas-storage-for-your-functions/

The performance is also good. This powers Colorisebot on Twitter.

@Toggi3
Copy link

Toggi3 commented Jan 25, 2018

What if that thing is a gpg encrypted archive that is >10GB that has to be decrypted then untar'd, dumps out a ton of proprietary audio files from a call interaction system that have to be transcribed into digestible pcm wav and csv metadata by another process and stored back on the volume for another process to pick up?

I have to first wait for a copy operation from s3, do my operations, then copy it back? Too much time. I already have to sometimes pull these things from remote SFTP, S3, google drive, locations over the internet and I am targeting 24 hour turnaround for jobs like these every day, end-to-end. We don't choose how our payloads are constructed, or even necessarily how they are delivered, because we aren't the producers of them. Some of these payloads are not nice to work with at all.
Our customers pay us to worry about that problem for them.

@alexellis
Copy link
Member

@Toggi3 you'd have the same problem(s)/issue(s) with an underlying NFS filesystem. Moving a 10GB file around on the network is a very specialist problem.

@Toggi3
Copy link

Toggi3 commented Jan 25, 2018

Over the weekend I might try to do as you suggest and compare performance. I agree I have a very specialist problem, for which I have been seeking out specialist solutions like docker functions...

@mycodecrafting
Copy link

mycodecrafting commented Feb 2, 2018

So a persistent posix compatible volume introduces too much state in the system, but a persistent object store does not? That doesn't even make any sense. State is state.

But let's say that's a valid argument for argument's sake. There is also the fact that there's not an object store out there that can compete with the performance of distributed parallel filesystems designed for high performance computing.

Or that very few real world applications can easily or efficiently interact with an object store. Not everyone is working with brand new shiny applications. Very few people are. Most of us have to deal with legacy applications, and work to slowly change them over time while we dream of rewriting it in the always distant "someday."

Most of us also rely in some way on 3rd party libraries and apps, and again most of those cannot easily or efficiently interact with an object store.

Copy the file down and back up? If functions are indeed supposed to be short-lived, then suggesting that they should spend 2/3 of their runtime performing network transfers is rather silly. It's also a massive waste of CPU resources and time. Now we're required to have a substantial amount of additional resources in order to perform the same amount of tasks in the same time as we could if we could just grab the data off of a volume.

But let's just say we're fine with copying the file down and back up. What about operations where we need large amounts of disk space. Let's take video transcoding for example. We may need several hundred GBs or more of disk scratch space to perform the operation. We probably want to be able to run more than one function at a time on each server. And we're unlikely to have servers sitting around with several TB of local disk attached to each one, especially in the cloud. It's just cost-prohibitive. But we are probably more likely or inclined to have a large high performing distributed filesystem mounted on each one. Here's an example where we want the mount not for state at all (remember we are assuming here that we're fine with copying a massive file down and back up), but just for temp/scratch space in order to carry out the function.

Don't get me wrong, I'm a big fan of this project. And I can admire your dedication to the principles of the project. But the world isn't as black and white, and there are a whole host of people that you're shutting the door to the project on because they can't do something as simple as bind a mount. The door is shut on anyone with any kind of legacy application that they want to start to use something like this for. The door is shut on anyone with an application that has "a very specialist problem." The world is very specialized, and there are a whole lot of specialized applications. You're excluding a lot of people from benefiting from this project over what is such a small request.

It's your project, so do what you will, but at the end of the day nobody is asking for anything that docker doesn't already do. All that is being asked is that people be able to utilize an existing basic feature of docker.

Let's go back to the beginning (state) for fun. Functions can connect to any external service they want, databases for example. Is a mount point really going to encourage stateful behavior more than a database connection does? I don't really think so. You don't prevent a function from interacting with the world outside of it -- most of which is stateful. I don't see how a volume is fundamentally any different.

@justinfx
Copy link

justinfx commented Apr 8, 2020

I would like to follow up on this feature request. Honestly its something that is causing me to hit a wall with introducing Openfaas into our current pipeline and offering a migration path from our current VMs and shared process manager approaches to deploying small arbitrary services and event handlers for users. While I understand that it is considered an anti-pattern to rely on mounted volumes for state and configuration in containers, it is also very limiting for cases where it is needed.
In our case, our studio has enormous amounts of existing tooling and code where dependencies will read from the filesystem for configs that have not been migrated to something like Consul. So we have two use cases where we need our existing pipeline environments to work inside a container:

  1. read-only mounts for dependencies and filesystem configs
  2. read-write mounts for functions and services that need to perform some kind of data transformation

Sure it would be great if all of our code were updated to pull configs from consul, could be 100% packaged as standalone in a container, and do any filesystem data transformations through an object-store api. But we aren't there yet and the transition would be slow. We definitely want to get to this point though.

Furthermore, Openfaas states that it officially also supports long-running microservice workloads in addition to short-lived functions. So to say that Openfaas only focuses on faas patterns doesn't seem to align with that extended support? I feel it would be ideal to enable users to solve their problems, even if it means they have to enable the feature and that there are warnings and notes around the pattern as being less than ideal. In our case, it would really help transitioning our 15+ year old pipeline.

It seems Fission supports Volumes now in their pod spec:
https://docs.fission.io/docs/spec/podspec/volume/

But honestly, I want to use Openfaas. I've already prototyped custom templates for a facility private template store. I have written some patches to faas-cli to support nested template store paths within a repo. I like the faas-cli functionality and how it unifies deploying functions and services. But my one sticking point is the volume mounting limitation. Is there really no value in providing volume mounting at this point, even when competing frameworks like Fission seem to see value in providing it? Could it maybe be a feature that has to be enabled at the openfaas deployment configuration level to opt into the support?

As a semi-related annecdote, I maintain the build and deploy system for our code at my studio. It happens to be an extension of the Waf build system. Now the maintainer of the Waf build system is extremely opinionated about what should and should not be allowed in the build process for a user, which has led to some feature requests or pull requests being denied. In which case, they end up being an extension added to our build system instead, because we need to enable users to solve their problems. There may not be something directly provided as a 1st class concept in our build system layer, but then we still enable users enough flexibility to do what they need to do to solve their problems. They may need to opt into a feature that is documented with caveats or opinions.

@feiniao0308
Copy link

Actually, I have similar request before #1232 .

@justinfx
Copy link

justinfx commented Apr 8, 2020

Yes @feiniao0308 it sounds like my situation as well, where we have a studio with tons and tons of library and application versioned deployments to various nfs mounts. They are deployed frequently by many teams. It is currently not feasible for us to fully package them into a container as we aren't 100% able to trace all the dependencies. Some libraries link against other libraries, so you have to resolve the entire dependency chain, even looking at the RPATH of linked libraries, etc, etc.
Exposing the NFS mounts via minio to the container still puts the responsibility on the code in the container to know the entire dependency chain to pull in and put into correct places in the container filesystem. Its not ideal, but its what we have right now.

@feiniao0308
Copy link

Exactly. I hope OpenFaaS could expose the mount option. When I search mount keyword in the issues, I did see many similar issues.

@justinfx
Copy link

I've confirmed on each of their slack channels that both Fission and Nuclio support full expression of volume mounting in their yaml specs. It would be really awesome if Openfaas would match the support.

@feiniao0308
Copy link

Not sure if OpenFaaS will support to expose the mount option and let function owner to make the decision. It'll make OpenFaaS more flexible if the function has this option exposed. @alexellis @justinfx

@justinfx
Copy link

@feiniao0308 I've already got NFS mounts working in the PodSpec, in Fission.io functions.

@feiniao0308
Copy link

@justinfx you add extra steps to update function pod spec after it's deployed? How do you make it work?

@justinfx
Copy link

justinfx commented Apr 21, 2020

@justinfx you add extra steps to update function pod spec after it's deployed? How do you make it work?

Not to get too off topic about another project, but you just use their --spec flag to generate all the yaml. Then you customise the PodSpec once. And then you can deploy with fission spec apply.

@feiniao0308
Copy link

@justinfx thanks for the info. I'll check that project. Thanks!

@pyramation
Copy link

just +1'ing this as a feature that would be great to have. I've read everyones arguments in this issue as well as this one #1178 and I think it's fair to say this would be a great option.

@justinfx
Copy link

I was reading about Hashicorp Nomad and the integration with OpenFaas via the faas-nomad provider. On the topic of volume mounts...
Nomad provides support for legacy command deployments that can't easily be wrapped into a container, as well as the volume mounting for all job types. This suits my work environment quite well as a way to get tons of legacy code up and running from departments that don't have the resources to focus on containerised solutions. That being said, is there some way to pass through the nomad job/group volume options with this approach? Or is all of that control still abstracted away at the openfaas faas-provider layer? I thought maybe I could still use openfaas if we picked up nomad and could mount our shared nfs volumes or host mounts.
Looking for any solution to this proposal. Because so far I have needed to commit to using Fission in my prototype work, for the volume mounting support.

@alexellis
Copy link
Member

@pyramation what is your specific use-case, and have you tried object storage yet?

@alexellis
Copy link
Member

@justinfx can you detail what your function does? It is likely that you can use object storage such as Minio or build data into the container image.

@justinfx
Copy link

justinfx commented Aug 7, 2020

Hi @alexellis . I work at a large visual effects studio, with many years of legacy code making up the pipeline. In addition to traditional core and pipeline developers we have hundreds of artists with some level of coding skills that are capable of scripting tooling around their primary digital content creation packages (Autodesk Maya, Foundry Nuke, SideFx houdini, ...). Our common way of interacting with project data is through an complex abstraction on top of NFS mounts and filers. Layers are built on layers, with applications and libraries that write to the file system.
So here would be a hypothetical example case. A technical artist from a department responsible for simulating muscles over a an animated skeleton rig wants to monitor for the event of that animated rig having a newer version published. In response to this publish event, a function should fire that looks up the asset locations of the new animated rig, resolve their location on the nfs file system, open a scene file in another location on the file system, reimport the new rig version, version up the scene file on disk, and then submit a new muscle simulation version to our render farm (let's just pretend this whole version up, validation, and submission takes <60s). So the interactions with the nfs file system mounts are important here, for all the existing code responsible for dealing with asset management, and the animation package where the simulation is loaded and rendered, and the output files being stored. A lot of code from the ground up would have to be rewritten to go through an object storage API to make use of Minio (as far as I understand). Not to mention that we don't even have control over the 3rd party applications that don't know how to use the object storage API.
Another issue that we have is our environment management system, which we use to combine many many versions of software together to run on different projects, and even modifications on a per scene or shot basis to software versions. All of this software is currently stored on our nfs mounts. Now we have the future goal of being able to 100% containerize our applications and services, but we aren't there yet. We can do it with our services, but not yet with many of our applications. So our current solution for these cases is to mount the software deploy locations so that dependencies can be picked up. We can slowly transition to having a better containerization story across the board, but we can't instantly convert to going through an object storage API in all cases.
My point is that preventing the NFS volume mounting on a principle issue is just limiting our access to this framework while we try and migrate to newer solutions.

@pyramation
Copy link

@justinfx that's interesting! so if I understand correctly, some assets are so large that it's better to have functions dynamically attach themselves to read (and potentially write) to the drives to perform an operation, vs having to download them over a wire each time. (p.s. I used to work for SESI w/houdini)

@alexellis my number one use case right now is developer experience for creating openfaas functions, particularly hot-loading. I'm using kubernetes and openfaas and if, during development, I could hot-load my code, I would save quite a bit of time that I normally sit there building the docker images for every code change. In the case of nodejs it would save me a up to minute for every code change. Even when using python, it can feel like a larger-than-needed compile step for any code change, compared to if we had a hot-loading volume the changes take milliseconds - Simon did write something for docker-compose https://gitlab.com/MrSimonEmms/openfaas-functions/-/blob/master/docker-compose.yaml#L12 but I would be great if there was a solution for k8s.

@justinfx
Copy link

justinfx commented Aug 7, 2020

@pyramation yes I should have put a little more effort into focusing on the data question from @alexellis. We generate lots of data. It would not be uncommon for a simulation to generate 1TB of temporary or intermediate data. Our pipelines are about transforming data until ultimately we produce final pictures for a movie. So the idea of using functions in our pipeline would be to respond to async events and perform transformations on arbitrary data. Some work would be too time consuming and need far too many resources to be done in a function invocation, in which case we would just use functions to trigger jobs in our render farm. But there is plenty of work to be done in event handlers where we need access to image data, simulation data, scene files, and applications and libraries that may have no support for an object storage API. We need the flexibility to support these workflows, even if ultimately it would be better to do what we can through an object storage api that maybe proxies to our nfs file system.

@justinfx
Copy link

@alexellis have you had time to consider my last replies to your question as to why a Minio solution would not be sufficient? I would like to know if your position is firm on this and we cannot expect Openfaas to ever allow any kind of mounts (nfs, hostPath, config map). Or if maybe with the amount of support for this feature request, possibly your position has softened to where it could be a opt-in configuration option in the deployment of Openfaas? I feel that there have been enough replies to your request for justification of the feature that it warrants some kind of support to bring this project in line with the same offering in other frameworks.

@funkymonkeymonk
Copy link

Hey folks. I wanted to bring this up again because by not allowing access to volumes, functions are not able to communicate with the GPIO pins using /dev/mem. This is a problem for me and the work around is to run your containers in privileged mode which feels like an even worse idea that possibly allowing state in containers. Given the pi is an explicit deploy target and IoT is an explicit use case this seems like a miss. Is there a work around here that I'm missing?

@justinfx
Copy link

@funkymonkeymonk it seems clear that Openfaas has a hard stance against allowing mounts. But a workaround for this limitation is to use mutating webhooks in kubernetes, which would let you do something like an annotation declaring a need for a mount, and the webhook can mutate the spec and add the volumes. You could either write and deploy a mutating web hook manually, or implement it in something like Open Policy Agent.

@funkymonkeymonk
Copy link

Thanks for the thought. Unfortunately I am using faasd to avoid having to run k3s so kunernetes based solutions require a full rethink and honestly if I'm going that route I'll likely look at alternatives instead.

@Docteur-RS
Copy link

I also really wish that volumes were exposed....
We use Openfaas to download and uncompress files before uploading them on S3. But the files are too big to be downloaded on the Kubernetes node itself. We need a volume !
To fix this we had to make our own deployments matching the ones created by openfaas and add the volumes ourselves. It's working but we can't use coldstart anymore...
Also the volumes are always up even if the fonctions are not in use.

@aslanpour
Copy link

I also have a use case that requires applying --volume to the container to allow USB access for a container to connect to the TPU device that is attached to my Pi. I am not sure if there is a way other than --volume to achieve this.

@alexellis
Copy link
Member

I own a Coral edge TPU, so find out exactly what is required and copy and paste the pod spec here. We will not be enabling privileged mode for functions, which I saw you request a day or two ago. I sent you some examples on the issue with devices etc.

Did you try them?

@aslanpour
Copy link

Thank you, Alex. I got your point. I am going to document how I gave functions TPU, GPU, etc access and will share it here later.

@Docteur-RS
Copy link

Finaly we searched for another technlogy... We really need big volumes attached to our functions.

We tried Fission. It does provide a new volume for each new function but unfortunatly it doesn't scale back to 0.
This means that we always have one pod with a huge mounted volume. A pod that does nothing... Just waiting there to serve a request. So tons of money lost doing nothing really.

We ended up using Gitfaas.... It creates one pod per request and you have a complete access to the deployment's specs. So we have a clean volume created each time.
It also allows to define the volume size at startup. So no money loss over oversized volumes. And it scales back to 0. No money lost here either.

I know that Openfaas has chosen to fork processes to gain speed. But as leaders on this subject I don't like it when they force people into thinking that FAAS must always be short lived and fast. It's just the road they chose to go down.
Volumes support is simply impossible to achieve in openfaas due to it's technical implementation. And I'm okay with this.

Each piece of tech has its force and weakness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.