Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NervesHubLink Fails to Start #52

Closed
psteininger opened this issue Dec 3, 2020 · 21 comments
Closed

NervesHubLink Fails to Start #52

psteininger opened this issue Dec 3, 2020 · 21 comments

Comments

@psteininger
Copy link

I was trying to get a certificate set up on the ATECC608A today, and I ran into the following error:

15:33:26.670 [info]  Application nerves_hub_link exited: exited in: NervesHubLink.Application.start(:normal, [])
    ** (EXIT) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
        ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started

I also get this when I run this from ssh console on the device:

iex(2)> Application.ensure_all_started(:thirsty)
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {:noproc,
      {GenServer, :call,
       [
         ATECC508A.Transport.I2CSupervisor,
         {:start_child,
          {{ATECC508A.Transport.I2CServer, :start_link,
            [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]},
           :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}},
         :infinity
       ]}}}}}}}

Trying to establish connection to ATEC608A results in similar error:

iex(3)> {:ok, i2c} = ATECC508A.Transport.I2C.init(bus_name: "i2c-1")
** (exit) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.11.2) lib/gen_server.ex:1017: GenServer.call/3
    lib/atecc508a/transport/i2c.ex:43: ATECC508A.Transport.I2C.init/1

I am pretty stumped as to what is wrong. This runs on a Raspberry Pi Zero W rev 1.2, I really appreciate some guidance on this.
These are my current dependencies:

defp deps do
    [
      # Dependencies for all targets
      {:nerves, "~> 1.7.0", runtime: false},
      {:nerves_hub_cli, "~> 0.10.4", runtime: false},
      {:nerves_hub_link, "~> 0.9.2"},
      {:nerves_time, "~> 0.2"},
      {:nerves_key, "~> 0.5"},

      {:toolshed, "~> 0.2.13"},
      {:shoehorn, "~> 0.7"},
      {:ring_logger, "~> 0.8.1"},
      {:circuits_i2c, "~> 0.3.7"},
      {:circuits_gpio, "~> 0.4.6"},
      {:ads1115, "~> 0.1"},
      {:gen_rmq, "~> 3.0"},
      {:jason, "~> 1.2"},
      {:wafer, git: "https://gitlab.com/jimsy/wafer"},


      # Dependencies for all targets except :host
      {:nerves_runtime, "~> 0.11.3"},
      {:nerves_pack, "~> 0.4.0", targets: @all_targets},
      {:busybox, "~> 0.1", targets: @all_targets},


      # Dependencies for specific targets
      {:nerves_system_rpi, "~> 1.13", runtime: false, targets: :rpi},
      {:nerves_system_rpi0, "~> 1.13", runtime: false, targets: :rpi0},
      {:nerves_system_rpi2, "~> 1.13", runtime: false, targets: :rpi2},
      {:nerves_system_rpi3, "~> 1.13", runtime: false, targets: :rpi3},
      {:nerves_system_rpi3a, "~> 1.13", runtime: false, targets: :rpi3a},
      {:nerves_system_rpi4, "~> 1.13", runtime: false, targets: :rpi4},
      {:nerves_system_bbb, "~> 2.6", runtime: false, targets: :bbb},
      {:nerves_system_x86_64, "~> 1.13", runtime: false, targets: :x86_64},
    ]
  end
@fhunleth
Copy link
Contributor

fhunleth commented Dec 3, 2020

What device are you trying this on? I'm inferring that it's not a Raspberry Pi Zero based on your comment.

@fhunleth
Copy link
Contributor

fhunleth commented Dec 3, 2020

Oh, I think that I misread your comment. You're saying that it is running on a Raspberry Pi Zero.

@fhunleth
Copy link
Contributor

fhunleth commented Dec 3, 2020

Could you run Application.ensure_all_started(:nerves_key) and then run Application.ensure_all_started(:thirsty)?

nerves_hub_link has an optional dependency on :nerves_key. Optional dependencies aren't started by Application.ensure_all_started/1.

@psteininger
Copy link
Author

psteininger commented Dec 3, 2020

Application.ensure_all_started(:nerves_key)

{:error,
 {:atecc508a,
  {{:shutdown,
    {:failed_to_start_child, ATECC508A.Transport.I2CSupervisor,
     {:already_started, #PID<0.1836.0>}}},
   {ATECC508A.Application, :start, [:normal, []]}}}}

@psteininger
Copy link
Author

psteininger commented Dec 3, 2020

@fhunleth thanks for replying so quickly :). I want to add one more point. this happens before wlan0 is enabled (it becomes active about 5 seconds later).

I was able to start the supervisor manually, that's why the above error is confusing.

Here's what I got after reboot:

iex(1)> Application.ensure_all_started(:nerves_key)
{:ok, [:x509, :circuits_i2c, :atecc508a, :nerves_key_pkcs11, :nerves_key]}

@psteininger
Copy link
Author

Hmmm... this is a bit confusing. Now I am able to:

iex(3)> {:ok, i2c} = ATECC508A.Transport.I2C.init([])
{:ok, {ATECC508A.Transport.I2C, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"}}

@psteininger
Copy link
Author

I do however, still see this:

iex(6)> Application.ensure_all_started(:nerves_hub_link)       
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {{:badmatch, {:error, :execution_error}},
      [
        {NervesKey, :device_cert, 2, [file: 'lib/nerves_key.ex', line: 97]},
        {NervesHubLink.Configurator.NervesKey, :"-maybe_add_cert/3-fun-0-", 2,
         [file: 'lib/nerves_hub_link/configurators/nerves_key.ex', line: 28]},
        {Keyword, :put_new_lazy, 3, [file: 'lib/keyword.ex', line: 629]},
        {NervesHubLink.Configurator.NervesKey, :build, 1,
         [file: 'lib/nerves_hub_link/configurators/nerves_key.ex', line: 16]},
        {NervesHubLink.Configurator, :build, 0,
         [file: 'lib/nerves_hub_link/configurator.ex', line: 21]},
        {NervesHubLink.Application, :start, 2,
         [file: 'lib/nerves_hub_link/application.ex', line: 14]},
        {:application_master, :start_it_old, 4,
         [file: 'application_master.erl', line: 277]}
      ]}}}}}}

and I only have this in config.exs for nerves_hub_link:

config :nerves_hub_link,
       fwup_public_keys: [:devkey]

@jjcarstens
Copy link
Collaborator

This last error is because NervesHubLink tries to use the NervesKey automatically if the dependency is detected. My guess is that it fails here because the ATECC508A is not yet provisioned, which causes the failure trying to read the necessary certs from it. Are you able to manually get the device and signer cert from your NervesKey?

@psteininger
Copy link
Author

@jjcarstens I haven't provisioned the device yet. I was stuck unable to connect to it because ATECC508A.Transport.I2C.init(bus_name: "i2c-1") was erroring out. It seems to be working now.
I will try to get the certs loaded up tomorrow (it's 5:30 pm here and thus dinner time ;) ). I will update here tomorrow.

Thank you both for responding so quickly :). It feels great to receive that kind of support.

@psteininger
Copy link
Author

I tried to recreate my original condition and commenting out these 2 lines in config.exs did the trick:

config :nerves_hub_link,
       fwup_public_keys: [:devkey]

however, after putting them back in, rebuilding firmware and uploading I still get the original error:

iex(1)> {:ok, i2c} = ATECC508A.Transport.I2C.init(bus_name: "i2c-1")
** (exit) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.11.2) lib/gen_server.ex:1017: GenServer.call/3
    lib/atecc508a/transport/i2c.ex:43: ATECC508A.Transport.I2C.init/1

This is what I get for an error:

iex(1)> Application.ensure_all_started(:nerves_hub_link)            
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {:noproc,
      {GenServer, :call,
       [
         ATECC508A.Transport.I2CSupervisor,
         {:start_child,
          {{ATECC508A.Transport.I2CServer, :start_link,
            [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]},
           :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}},
         :infinity
       ]}}}}}}}

I will keep digging in more and hopefully I will figure this out

@psteininger
Copy link
Author

Hmm... so if I run this:

iex(2)> Application.ensure_all_started(:nerves_key)                 
{:ok, [:x509, :circuits_i2c, :atecc508a, :nerves_key_pkcs11, :nerves_key]}

This then works:

{:ok, i2c} = ATECC508A.Transport.I2C.init([])
{:ok, {ATECC508A.Transport.I2C, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"}}

so how do I make sure this happens at startup?

@psteininger
Copy link
Author

psteininger commented Dec 4, 2020

I was able to provision the chip with a signer cert. Side note here - I would suggest updating docs to specify --years-valid by default or set the default years to 10 or longer. Writing permanently a soon-to-expire cert into a chip is a bit of a trap.

So after updating config.exs to include:

config :nerves_hub_link,
       fwup_public_keys: [:devkey]

config :nerves_hub_link, :nerves_key,
       certificate_pair: :primary,
       i2c_bus: 1

I still experience the same issue upon reboot (with updated firmware)

iex(1)> Application.ensure_all_started(:nerves_hub_link)                                     
{:error,
 {:nerves_hub_link,
  {:bad_return,
   {{NervesHubLink.Application, :start, [:normal, []]},
    {:EXIT,
     {:noproc,
      {GenServer, :call,
       [
         ATECC508A.Transport.I2CSupervisor,
         {:start_child,
          {{ATECC508A.Transport.I2CServer, :start_link,
            [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]},
           :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}},
         :infinity
       ]}}}}}}}

But... if I do the following:

iex(2)> Application.ensure_all_started(:nerves_key)     
{:ok, [:x509, :circuits_i2c, :atecc508a, :nerves_key_pkcs11, :nerves_key]}

Then this starts up ok:

iex(3)> Application.ensure_all_started(:nerves_hub_link)
{:ok,
 [:extty, :fwup, :unicode_util_compat, :idna, :mimerl, :certifi, :syntax_tools,
  :parse_trans, :ssl_verify_fun, :metrics, :hackney, :jason, :inets,
  :websocket_client, :phoenix_client, :nerves_hub_link]}

So, the question is... how to permanently fix it - is this something I need to address in my application.ex somehow or it's something that nerves_hub_link should deal with. It just seems that the :nerves_key application is not starting at all, or at least not before :nerves_hub_link.

@psteininger
Copy link
Author

One last bit of data: Just running this:

Application.ensure_all_started(:atecc508a) 

is enough to allow ATECC508A.Transport.I2C.init([]) to work correctly. So, for some reason, the ATECC508A.Application and the ATECC508A.Transport.I2CSupervisor don't get started

@fhunleth
Copy link
Contributor

fhunleth commented Dec 4, 2020

@jjcarstens What do you think about putting Application.ensure_all_started(:nerves_key) in :nerves_hub_link to force it to start first. I don't know any other way around the optional dependency issue and I haven't seen much progress on accepting Jose's OTP PR in a while.

@psteininger
Copy link
Author

@fhunleth where is the :nerves_key application defined? I can't see NervesKey.Application defined anywhere. How does that work?

@fhunleth
Copy link
Contributor

fhunleth commented Dec 4, 2020

It's not since :nerves_key is a pure library. However, if that were to ever change, then I didn't want to have to remember to update :nerves_hub_link.

I think that :nerves_hub_link only has an optional dependency on :nerves_key, so it seems wrong for it to call ensure_all_started on anything but its direct dependencies. Given that there's a call to ATECC508A.Transport.I2C.init/1, there's an argument that it should directly depend on :atecc508a too.

@psteininger
Copy link
Author

Given that there's a call to ATECC508A.Transport.I2C.init/1, there's an argument that it should directly depend on :atecc508a too.

@fhunleth I think this makes sense.

In the meantime, is there any reliable way for me to make sure that the ATECC508A.Transport.I2CSupervisor starts up when my app starts up?

@fhunleth
Copy link
Contributor

fhunleth commented Dec 4, 2020

:shoehorn loads the :init list in order, so if you add :atecc508a to the front of it in your config, that should work.

@psteininger
Copy link
Author

I got this in target.exs:

config :shoehorn,
  init: [:nerves_runtime, :nerves_pack, :atecc508a],
  app: Mix.Project.config()[:app]

but I still see these errors:

22:18:19.561 [info]  Application nerves_hub_link exited: exited in: NervesHubLink.Application.start(:normal, [])
    ** (EXIT) exited in: GenServer.call(ATECC508A.Transport.I2CSupervisor, {:start_child, {{ATECC508A.Transport.I2CServer, :start_link, [["i2c-1", 96, :"Elixir.ATECC508A.Transport.I2C.i2c-1.96"]]}, :permanent, 5000, :worker, [ATECC508A.Transport.I2CServer]}}, :infinity)
        ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started

22:18:19.564 [info]  Application x509 exited: :stopped

22:18:19.578 [info]  Application phoenix_client exited: :stopped

22:18:19.579 [info]  Application websocket_client exited: :stopped

22:18:19.593 [info]  Application inets exited: :stopped

22:18:19.594 [info]  Application jason exited: :stopped

22:18:19.606 [info]  Application hackney exited: :stopped

22:18:19.608 [info]  Application metrics exited: :stopped

22:18:19.609 [info]  Application ssl_verify_fun exited: :stopped

22:18:19.610 [info]  Application parse_trans exited: :stopped

22:18:19.611 [info]  Application syntax_tools exited: :stopped

22:18:19.613 [info]  Application certifi exited: :stopped

22:18:19.614 [info]  Application mimerl exited: :stopped

22:18:19.615 [info]  Application idna exited: :stopped

22:18:19.617 [info]  Application unicode_util_compat exited: :stopped

22:18:19.618 [info]  Application fwup exited: :stopped

22:18:19.619 [info]  Application extty exited: :stopped

@psteininger
Copy link
Author

@fhunleth I think I know what's going on. It looks like the order of the dependencies matters. Here's what was causing problems:

iex(34)> cat "thirsty.app"
{application,thirsty,
             [{applications,[kernel,stdlib,elixir,logger,runtime_tools,
                             nerves_hub_link,nerves_time,nerves_key,toolshed,
                             shoehorn,ring_logger,circuits_i2c,circuits_gpio,
                             ads1115,gen_rmq,jason,wafer,nerves_runtime,
                             nerves_pack,busybox]},
              {description,"thirsty"},
             ...
              {registered,[]},
              {vsn,"0.1.0"},
              {mod,{'Elixir.Thirsty.Application',[]}}]}.

I have changed the dependency definitions to start like this:

 {:nerves, "~> 1.7.0", runtime: false},
      {:shoehorn, "~> 0.7"},
      {:circuits_i2c, "~> 0.3.7"},
      {:circuits_gpio, "~> 0.4.6"},
      {:nerves_key, "~> 0.5", optional: false},
      {:nerves_hub_cli, "~> 0.10.4", runtime: false},
      {:nerves_hub_link, "~> 0.9.2"},
      {:nerves_time, "~> 0.2"},
      {:toolshed, "~> 0.2.13"},

and for good measure added nerves_key to :extra_applications. It's hackey, but it did the trick.

@psteininger
Copy link
Author

psteininger commented Dec 4, 2020

I think I just realized what was going on. It's pretty embarrassing, actually. It turns out that I was building the prod version of the app, but uploading the dev version. I believe the dev build was misconfigured. This is what I was doing:

MIX_TARGET=rpi0 MIX_ENV=prod mix firmware && ./upload.sh nerves-ext-rpi0.local

In my attempt to combine two steps into one I unfortunately created a problem. The upload script has rpi0 as default for MIX_TARGET, but dev for MIX_ENV. I didn't realize that the env vars weren't propagated to the upload script, only to the mix task. This only became apparent after I blew away the _build directory.

I am sorry for wasting your time and I really appreciate you both jumping in and trying to help me out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants