-
Notifications
You must be signed in to change notification settings - Fork 4
LapDevelopment
These pages are intended for LAP-internal use, to aid communication among developers. As such, they complement the LAP email list and the SVN repository. At least initially, only registered members of the LapGroup can edit this page (and sub-pages, as long as we make sure to assign access rights correctly). However, as these pages can be viewed by the public (including unregistered wiki users), sensitive or private information (e.g. passwords) should not be posted here.
Ideally, all components beyond the basic operating system that determine LAP functionality (and configuration) should be version-controlled in UiO-hosted repositories. In practice, we are currently working to put all tools, the API library, tool descriptions, and the full Galaxy instance under SVN control (using the LAP repository).
Whereever possible, LAP services run as non-privileged users, typically laportal.
Files that are only required on the LAP front-end node should be organized on its local disks (typically below /home/laportal/), rather than in the cluster-wide shared LAP directory (/projects/lap/).
No anonymous commits to SVN (or other activity that could not be traced to one individual). For example, laportal receives updates from SVN, but changes should only be checked in from real developer accounts.
Following is an attempt at breaking down LAP into major component pieces. More in-depth discussion of individual pieces, should be delegated to sub-pages, for example the LapDevelopment/Tree page.
LAP provides (language) processing services, e.g. segmenting a document into sentences and tokens, tagging parts of speech, and syntactic dependency parsing. We will refer to individual processing steps (e.g. sentence segmentation) as tools, and to a (posibly singleton) configuration of tools (e.g. the above pipeline) as a workflow. All tools available through LAP are maintained in a version controlled repository, dubbed the LAP Tree. Tools in the LAP Tree are ready-to-run binaries for a reasonably recent 64-bit (x86) Linux environment; external dependencies beyond basic operating components and a small set of standard shared libraries are wrapped with each tool, to make it possible to run tools from the LAP Tree without system administrator support. To allow installation of tools from the LAP Tree into an arbitrary location, there cannot be any configuration in terms of absolute path names; instead, where necessary, tools can refer to relative locations below the root of the LAP Tree, which is recorded in an environment variable $LAPTREE.
Tools in LAP typically receive some input to be processed and upon completion provide some result data, for example further analysis or annotation of the input. All input and output data to LAP tools is represented using the data model of the Linguistic Annotation Framework (LAF). LAF annotation records (in LAP) are typically serialized in JSON format and stored in a NoSQL MongoDB database. This LAP-internal repository of annotation records is dubbed LAP Store.
For integration with LAP, tools must be ‘wrapped’ so as to (a) communicate (read and write annotations) with the LAP Store and (b) register with and execute under Galaxy control. Seeing that Galaxy proper is implemented in Python, these wrappers will often be implemented in Python too (though a Java API, for example, might well be considered in the future). To avoid duplication of LAP-specific wrapper functionality, we designate
The LAP heart, in a sense, is an instance of the Galaxy Framework, a software platform for data-intensive processing originally developed for applications in bio-informatics. Galaxy internally can be sub-divided into various components. Following are brief characteristics of some noteworthy ones:
-
Galaxy System User: Galaxy runs as a non-priviliged user laportal (uid=226904) with Unix group laportal (gid=160632).
-
Galaxy SQL Database: For internal storage, Galaxy uses a PostGres database hpc_lap hosted on dbpg-it-forskning.uio.no.
-
Galaxy Data Directory: Besides the database (which is primarily used for metadata), Galaxy stores actual data files in the filesystem; LAP has a designated project area /projects/lap/, which is NFS-mounted on the LAP front-end node (ps.hpc) and all cluster nodes; the so-called database/ sub-directory of the Galaxy instance is soft-linked to /projects/lap/data/galaxy/.
-
Galaxy Web Server: Galaxy operates its own internal web server on port 8080, but for better scalability and authentication support, we wrap it behind an Apache HTTP Server proxy (see below).
-
Galaxy Tool Descriptions: In addition to global Galaxy configuration, each tool needs to be registered with Galaxy through a tool description, an XML file that Galaxy expects to find in its tools/ sub-directory.
-
Galaxy Job Sumission:
-
Galaxy Accounting:
Home | Forum | Discussions | Events