Skip to content

fix: avoid double conns, better state tracking#1505

Merged
divagant-martian merged 6 commits into
mainfrom
sync-fix-parallel-connect
Sep 21, 2023
Merged

fix: avoid double conns, better state tracking#1505
divagant-martian merged 6 commits into
mainfrom
sync-fix-parallel-connect

Conversation

@Frando

@Frando Frando commented Sep 20, 2023

Copy link
Copy Markdown
Member

Description

On main the test sync_full_basic is flakey. This PR (hopefully) fixes it.

The reason was: We have the situation that two peers initiate connections to each other at (roughly) the same time. In #1491 this was sometimes prevented, but not reliably.

This PR fixes it by:

  • When connecting, we set a SyncState::Dialing for the (namespace, peer)
  • When accepting, if our own state is Dialing for the incoming request for (namespace, peer) we compare our peer id with that of the incoming request, and abort if ours is higher (doesn't matter which way, we care about a predictable outcome only
  • Through this, only one of the two simoultanoues connections will survive
  • Also added a Abort frame to the wire protocol to transfer to inform the dialer about the reason of the declined request, which is either "we do double sync, and will take the other conn" (AlreadySyncing) or "this replica is not here" (NotAvailable)

This PR also:

  • Further improves logging
  • Improves errors

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.

@Frando Frando force-pushed the sync-fix-parallel-connect branch 2 times, most recently from fe5d473 to 1ba1941 Compare September 20, 2023 23:47
@Frando Frando force-pushed the sync-fix-parallel-connect branch from 1ba1941 to 78263bb Compare September 20, 2023 23:50
@divagant-martian divagant-martian added this to the v0.6.0 milestone Sep 21, 2023

@divagant-martian divagant-martian left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only things that jump to me is doing explicit imports of std::result::Result if you are using anyhow::Result, setting the error will overwrite the generic so that's not necessary (if that's the reason)

Comment thread iroh-sync/src/net.rs Outdated
Comment thread iroh-sync/src/net/codec.rs Outdated
Comment thread iroh/src/sync_engine/live.rs Outdated
Comment thread iroh/src/sync_engine/live.rs Outdated
Comment thread iroh/src/sync_engine/live.rs Outdated
Comment thread iroh/src/sync_engine/live.rs Outdated
Comment thread iroh/src/sync_engine/live.rs Outdated
@divagant-martian divagant-martian added this pull request to the merge queue Sep 21, 2023
Merged via the queue into main with commit d8cc9df Sep 21, 2023
@dignifiedquire dignifiedquire deleted the sync-fix-parallel-connect branch November 1, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants