Skip to content

Commit b61a22e

Browse files
Move fsdb.mdx into OSS visible features directory
Summary: As titled Reviewed By: joseph5wu Differential Revision: D86640166 fbshipit-source-id: c5d46c0ffc687a99886989a60dfee25711bbcd6f
1 parent 6b0b6a5 commit b61a22e

File tree

1 file changed

+192
-0
lines changed
  • fboss/agent/wiki/static_docs/docs/Features

1 file changed

+192
-0
lines changed
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
---
2+
id: Features_fsdb
3+
title: FSDB
4+
custom_edit_url: https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/fboss/agent/wiki/static_docs/docs/Features/fsdb.mdx
5+
---
6+
7+
# FSDB
8+
9+
## Overview
10+
11+
### tl;dr
12+
13+
* A central store for all state of a switch.
14+
15+
* Follows a pub/sub model backed by thrift streams
16+
17+
* Onbox services can publish their state (examples) and stats
18+
19+
* Both on-box and off-box services can subscribe to this data.
20+
21+
* See [FSDB Model](https://www.internalfb.com/code/fbsource/fbcode/fboss/fsdb/if/facebook/fsdb_model.thrift) for all data FSDB holds. Each core fboss service owns a branch of this state and is in charge of keeping it updated through the [Pub Sub Protocol](https://fb.quip.com/r0wqAF9D12SR).
22+
23+
* Clients can use the client libraries (currently cpp only) to publish/subscribe to fsdb
24+
25+
26+
### What
27+
28+
FBOSS forwarding stack consists of several binaries - wedge_agent, qsfp_service, bgp, openr, etc. The processes for these binaries on a switch have associated stats and operational state at any instance. FSDB is a FBOSS forwarding stack binary to allow on-box processes to publish all their data to on-box FSDB. Subscribers, whether on-box or off-box, can subscribe to any subset of the data available in FSDB.
29+
30+
### Why
31+
32+
A lot of data from forwarding stack binaries is needed by other on-box and off-box processes. To make this data available, each binary needs to add custom thrift APIs for each individual fragment of data. This comes with some costs that FSDB will solve:
33+
34+
* Polling is required if a service needs live data from another service
35+
36+
* Extra development effort whenever new data is needed to be exposed
37+
38+
* CPU cycles needed to serve potentially high throughput (short interval polling) clients
39+
40+
41+
Furthermore, having all state piped to a central store can provide some useful applications such as custom historical stats, simple data aggregation, state replay etc.
42+
43+
### Terminology
44+
45+
FSDB’s model viewable at [FSDB Model](https://www.internalfb.com/code/fbsource/fbcode/fboss/fsdb/if/facebook/fsdb_model.thrift). is split into two parts:
46+
47+
* State: Data that changes sporatically and generally event based. i.e. port operational states, neighbor maps, configs etc.
48+
49+
* Stats: Statistic data that generally changes periodically. i.e. hw port stats, link flaps etc.
50+
51+
52+
FSDB pub/sub apis operate on a given path in these models, with two general operational modes (more on this in the Client section):
53+
54+
* Path: Operate on a complete object at the given path. i.e. serve entire object at given path on change to any subpath
55+
56+
* Delta: Operate on deltas at the given path. i.e. serve granular deltas on changes to subpaths under the given path
57+
58+
59+
## Integration Guide
60+
61+
See [FSDB API](https://www.internalfb.com/code/fbsource/fbcode/fboss/fsdb/if/facebook/fsdb.thrift) for all exposed thrift apis for subscribing and publishing. All clients need to follow the [Pub Sub Protocol](https://fb.quip.com/r0wqAF9D12SR). For C++ clients please also see the Client Libraries section.
62+
63+
### Subscribers
64+
65+
*Runbook:*
66+
67+
1. Check if desired data is in [FSDB Model](https://www.internalfb.com/code/fbsource/fbcode/fboss/fsdb/if/facebook/fsdb_model.thrift)
68+
69+
2. Double check with the team that owns the component that their service is already publishing data (rollout constraints). You may also use the fboss2 to validate (see CLI section)
70+
71+
3. Pick the type of subscription: Path or Delta. Delta subscriptions are more efficient on transport however may require more book keeping client side
72+
73+
4. Find the path in the model you want to subscribe to. For example, agent sw config in [FSDB Model](https://www.internalfb.com/code/fbsource/fbcode/fboss/fsdb/if/facebook/fsdb_model.thrift) is at path [“agent”, “config”, “sw”]
74+
75+
5. Use a subscribe api from [FSDB API](https://www.internalfb.com/code/fbsource/fbcode/fboss/fsdb/if/facebook/fsdb.thrift) that suits your needs
76+
77+
78+
For more complex subscriptions you may also use the extended apis which accept regex paths (see Extended Paths section)
79+
80+
### Publishers
81+
82+
*Runbook:*
83+
84+
1. For new services onboarding, first you need to model the internal state of your service in a way that makes sense for both your service and external clients.
85+
86+
2. Create a thrift struct that models this internal state and add it to [FSDB Model](https://www.internalfb.com/code/fbsource/fbcode/fboss/fsdb/if/facebook/fsdb_model.thrift)
87+
88+
3. Modify [FSDB Config](https://www.internalfb.com/code/configerator/[094f1e2b7833]/source/neteng/fboss/coop/inputs/fsdb_template.mcconf) to recognize your service as a publisher, add the path in the model that your service will own
89+
90+
4. Publish to your service’s path making sure to follow the [Pub Sub Protocol](https://fb.quip.com/r0wqAF9D12SR)
91+
92+
5. Test that FSDB recieves the expected data (See Testing section)
93+
94+
95+
## Clients Libraries
96+
97+
FSDB uses thrift streams for subscribing and publishing. We have some libraries to help manage these streams and the threading model. Currently we only have libraries in C++.
98+
99+
### FsdbPubSubManager
100+
101+
[FsdbStreamClient](https://www.internalfb.com/code/fbsource/[14213339a29e]/fbcode/fboss/fsdb/client/FsdbStreamClient.h) is a low level client for establishing a stream with FSDB and handle reconnections.
102+
103+
[FsdbPubSubManager](https://www.internalfb.com/code/fbsource/[14213339a29e]/fbcode/fboss/fsdb/client/FsdbPubSubManager.h) is a higher level abstraction around FsdbStreamClient that manages multiple thrift streams. As a client you may have multiple subscriptions (to different paths) and multiple types of publishers (state and stats). [FsdbPubSubManager](https://www.internalfb.com/code/fbsource/[14213339a29e]/fbcode/fboss/fsdb/client/FsdbPubSubManager.h) will maintain all these stream.
104+
105+
### FsdbSyncManager
106+
107+
Your service may have multiple different components that are in charge of different parts of the published state, all happening on different threads. As an example, 2 such components in wedge_agent are SwitchState and lldp state. We want each component to be in charge of its own publishing, but still follow [Pub Sub Protocol](https://fb.quip.com/r0wqAF9D12SR) and properly do a synchronized initial sync. [FsdbSyncManager](https://www.internalfb.com/code/fbsource/[14213339a29e]/fbcode/fboss/fsdb/client/FsdbSyncManager.h) abstracts this logic out.
108+
109+
## Testing
110+
111+
### Unit Testing
112+
113+
* Testing interactions with FSDB server:
114+
* Leverage [FsdbTestServer](https://www.internalfb.com/code/fbsource/[6680b5af90f1f2e6fa910a57611540e26d482688]/fbcode/fboss/fsdb/tests/utils/FsdbTestServer.h?lines=15) to create an in-process FSDB server instance for use in UTs. This allows implementing validations to make use of FSDB server's internal state.
115+
116+
* Example: See [FsdbSyncer tests](https://www.internalfb.com/code/fbsource/[6680b5af90f1f2e6fa910a57611540e26d482688]/fbcode/fboss/agent/test/fsdb_tests/facebook/FsdbSyncerTest.cpp?lines=23) as example of using FsdbTestServer. These UTs can be run on devserver as:
117+
118+
```buck2 test //fboss/agent/test/fsdb_tests/facebook:fsdb_tests```
119+
120+
121+
### Integration tests
122+
123+
FBOSS integration tests run fsdb as a service to test integration with various clients including bgp, nsdb and wedge_agent.
124+
Example command to run routing integration tests:
125+
```
126+
netcastle --team fboss_integration --test-config routing/tomahawk3/10.2.0.0_odp__10.2.0.0_odp/brcm
127+
```
128+
129+
During FSDB client development, to run FSDB as a service for testing, the intergation test framework can be used to quickly set up the service as follows:
130+
```
131+
netcastle --team fboss_integration --test-config routing/tomahawk3/10.2.0.0_odp__10.2.0.0_odp/brcm --skip-cleanup --basset-query $query_for_reserved_device
132+
```
133+
With the skip-cleanup flag, after integration tests are complete netcastle runner will leave the test services running on the reserved device. This is a quick and easy way to setup fsdb service for testing.
134+
Note: Please use the skip-cleanup option ONLY WITH YOUR RESERVED LAB DEVICE.
135+
136+
### Running FSDB
137+
138+
* For testing purposes, FSDB process can be run a device as follows:
139+
``` sudo ./fsdb --thrift_ssl_policy=permitted --checkOperOwnership=false ```
140+
141+
* Useful Config and CLI Flags
142+
* thrift_ssl_policy: indicates whether SSL is required/permitted/disabled for Thrift calls to FSDB server.
143+
* checkOperOwnership: whether to perform permission checks on publisher id
144+
* stateSubscriptionServe_ms: interval at which FSDB serves state subscriptions
145+
* statsSubscriptionServe_ms: interval at which FSDB serves stats subscriptions
146+
147+
148+
## CLI
149+
150+
* Show currently active publishers:
151+
```fboss2 show fsdb publishers```
152+
153+
* Show currently active subscriptions:
154+
```fboss2 show fsdb subscribers```
155+
156+
* Show state published under specified path:
157+
```fboss2 show fsdb state 'agent'```
158+
159+
* Show stats published under specified path:
160+
```fboss2 show fsdb stats 'agent/hwPortStats'```
161+
162+
## Advanced Features
163+
164+
### Extended Paths
165+
166+
### Metadata + Publisher Roots
167+
168+
### FSDB Timeouts
169+
170+
FSDB relies on [thrift stream](https://www.internalfb.com/intern/staticdocs/thrift/docs/fb/features/streaming/) and [thrift sink](https://www.internalfb.com/intern/staticdocs/thrift/docs/fb/features/streaming/sink) to support subscribers and publishers respectively.
171+
To ensure protection for the streams against bad servers and clients, we have various timeouts implemented. These reference wiki [ref1](https://www.internalfb.com/intern/staticdocs/thrift/docs/fb/features/streaming/#timeouts) and [ref2](https://www.internalfb.com/wiki/ServiceRouter/Overview/Timeout/) capture good details about the various timeouts
172+
supported by thrift and service router.
173+
174+
* Client side: The main timeout we are interested on the client side is the [chunk timeout](https://www.internalfb.com/code/fbsource/[b0bf4ae098f75b0512fd335e4450975ac2748abf]/fbcode/fboss/fsdb/client/FsdbStreamClient.cpp?lines=74). Currently, it can be configured by the client using the *fsdb_state_chunk_timeout* flag. This value is set to 12s except for DSF where its configured to 15s
175+
* Server side: One the server side, we rely on [stream expire timeout](https://www.internalfb.com/code/fbsource/[a96761ae342fda104fd67a20eab24ed59945b478]/fbcode/fboss/fsdb/server/Server.cpp?lines=136) to ensure that bad clients are removed. This is set to 15 mins in FSDB server (historical value and can be reconfigured if needed). However, stream timeout only kicks in after the server side credits expire.
176+
* A client provides 100 credits(configurable) to server by default. When the server has consumed half of provided credits, client replenishes them
177+
* In case of a bad client, the credits wont be replenished and the server might run out of the provided credits. Once the credits are exhausted, the stream expire timeout is started and once it expires, the client is removed.
178+
179+
To ensure we dont hit timeouts, we have a heartbeat mechanism where FSDB server sends empty message to client to keep the socket alive. This hearbeat interval is set to 5s by [default](https://www.internalfb.com/code/fbsource/[a96761ae342fda104fd67a20eab24ed59945b478]/fbcode/fboss/fsdb/server/ServiceHandler.cpp?lines=52). This can be configured by the client when registering for a subscription. Ex: DSF has set this to [2s](https://www.internalfb.com/code/fbsource/[a96761ae342fda104fd67a20eab24ed59945b478]/fbcode/fboss/agent/DsfSubscriber.cpp?lines=15)
180+
181+
:::note
182+
Even though we have set stream expire timeout to 15 mins, experimentation revealed that we are closing bad clients within 1 min(~54 seconds). This is because of TCP level ACK to our heartbeats. If the heartbeats are not ACKed because of client issues, the socket is closed and server disconnects a bad client. On testing without hearbeats, we noticed the timeout kicking in ~18 mins(credit expire + timeout)
183+
:::
184+
185+
186+
187+
188+
## Server Implementation
189+
190+
### ThriftPath + Cow storage
191+
192+
* TODO:

0 commit comments

Comments
 (0)