# Binding affinity estimation from restrained umbrella sampling simulations

### Theoretical foundation

Binding affinity is often quantified using the equilibrium dissociation constant (*K*_{d}), defined as:

$$K_mathrmd = left[ mathrmP right]left[ mathrmL right]/left[ mathrmP:mathrmL right]$$

(1)

where [P], [L] and [P:L] are the concentrations of protein, ligand and the protein–ligand complex, respectively. Computationally, the absolute binding free energy (Δ*G*°), which is the standard molar free energy of binding, is more convenient to calculate. The dissociation constant and the absolute binding free energy are related via

$$Delta G^circ = RTln fracK_mathrmd1,mathrmM$$

(2)

where *R* is the gas constant, *T* is the temperature and 1 M is 1 molar concentration. Various strategies have been used to estimate Δ*G*°, some of which were briefly discussed above. The methodology proposed here has a notable resemblance to the stratification strategy of refs. ^{3,4}. However, the two methods have major differences as will be discussed later in this section.

Absolute binding free energy or Δ*G*° is the free-energy change associated with moving the ligand from the bulk to the binding pocket (Supplementary Table 1). Within the formalism presented in this work, Δ*G*° is determined from the grid PMF *G*(**x**), where **x** is the position of the ligand mass center from the center of the binding pocket (Supplementary Table 1), *G*(**x**) is the PMF associated with the ligand position **x**. In practice, we need to bin the three-dimensional space and define the PMF at every bin or grid point as:

$$Gleft( boldsymbolx right) = – RTln pleft( boldsymbolx right)$$

(3)

where *p*(**x**) is the probability of finding the ligand at bin **x**.

We define Δ*G*(**x**) = *G*(**x**) − *G*(**0**), where **x** = **0** (that is, the center of the binding pocket) is defined as the grid point associated with the lowest grid PMF. One can show:

$$Delta G^circ = – RTln frac{{int_mathrmpocket {mathrme^ – fracGleft( boldsymbolx right)RT} mathrmdV}}{{int_mathrmbulk {mathrme^{ – frac{Gleft( boldsymbolx right)}RT}} mathrmdV}} = – RTln frac{{int_mathrmpocket {mathrme^ – fracDelta Gleft( boldsymbolx right)RT} mathrmdV}}{{int_mathrmbulk {mathrme^ – fracDelta Gleft( boldsymbolx right)RT} mathrmdV}}$$

(4)

in which the binding ‘pocket’ refers to all **x** ∈ **V** where the ligand is considered bound and ‘bulk’ refers to all **x** ∈ **V** where the ligand is not interacting with the protein. **V** here is a subset of space with a single protein in standard concentration (that is, 1 M). As Δ*G*(**x**) is the same everywhere in the bulk, we can simplify relation (4) as follows:

$$Delta G^circ = – RTln fracV_mathrmP{{mathrme^{ – frac{Delta Gleft( boldsymbolx_mathrmB right)}RT}V_mathrmB}} = – Delta Gleft( boldsymbolx_mathrmB right) – RTln fracV_mathrmPV_mathrmB$$

(5)

where *V*_{B} is the bulk volume per protein associated with the standard concentration, **x**_{B} is any grid point in the bulk and *V*_{P} is the binding pocket volume defined as:

$$V_mathrmP = mathop intnolimits_mathrmpocket {mathrme^{ – fracDelta Gleft( boldsymbolx right)RT}} mathrmdV$$

(6)

Defining Δ*G*_{V} as the contribution of the difference between the volume of the binding pocket and the bulk to the binding free energy:

$$Delta G_mathrmV = – RTln fracV_mathrmPV_mathrmB$$

(7)

Combining equations (5) and (7), we have:

$$Delta G^circ = – Delta Gleft( boldsymbolx_mathrmB right) + Delta G_mathrmV$$

(8)

We can find the bulk volume (*V*_{B}) associated with the standard concentration for a single protein approximately as:

$$V_mathrmB = frac{frac1N_mathrmA,mathrmmol}{1,mathrmM} = frac1{N_mathrmA}mathrmL approx 1,661,mathrmAA^3$$

(9)

where *N*_{A} is Avogadro’s constant and L is the unit of volume (litres). We can now rewrite Δ*G*_{V} as:

$$Delta G_mathrmV = – RTln fracV_mathrmPV_mathrmB = – RTln frac{V_mathrmP}mathrmAA^3 + RTln fracV_mathrmBmathrmAA^3 = Delta G_mathrmP – Delta G_mathrmB$$

(10)

in which Δ*G*_{B} is the bulk volume contribution and Δ*G*_{P} is the binding pocket contribution:

$$left{ {beginarray*20l {Delta G_mathrmB = – RTln frac{{V_mathrmB}}{mathrmAA^3} approx – 7.42RT} hfill \ {Delta G_mathrmP = – RTln frac{{V_mathrmP}}{{mathrmAA^3}} = – RTln mathop intnolimits_mathrmpocket {mathrme^{ – frac{Delta Gleft( boldsymbolx right)}RT}} fracmathrmdV{{mathrmAA^3}}} hfill endarray} right.$$

(11)

Determining both Δ*G*(**x**_{B}) and Δ*G*_{P} requires finding the grid PMF Δ*G*(**x**). Δ*G*(**x**_{B}) is the PMF difference between the binding pocket center and the bulk and Δ*G*_{P} also requires an estimate for Δ*G*(**x**) within the binding pocket. We therefore do not need to find Δ*G*(**x**) for all **x** if we have a good estimate for Δ*G*(**x**) within the binding pocket and in the bulk. Ideally, Δ*G*(**x**) for these points can be determined by pulling the ligand out of the binding pocket towards the bulk and using an enhanced sampling technique such as US to sample the space of a collective variable such as *d*, that is, the distance between the mass centers of the ligand and protein. Δ*G*(**x**) can be estimated for all sampled grid points **x** using this distance-based US simulation. Note that the collective variable used for biasing would be *d*, while the collective variable used for the PMF calculations would be the three-dimensional position vector of the mass center of ligand with respect to protein’s binding pocket center. One may estimate the grid PMF from the distance-based US simulations using a non-parametric reweighting algorithm as discussed in this section. Δ*G*(**x**) can also be used to estimate Δ*G*_{P} as defined in relation (11). There is often no need to strictly define the binding pocket as only low Δ*G*(**x**) values have non-negligible contribution to *V*_{P} and thus even if we include all sampled grid points, only those close to the binding pocket center have non-negligible contributions.

A practical issue with determining Δ*G*(**x**_{B}) is the convergence. The key obstacles for the sampling that slow down the convergence are the orientation of the ligand, and the conformational changes of the ligand and protein. Using an approach similar in spirit to the previously proposed stratification strategy^{3,4,24}, we can circumvent extensive sampling of these degrees of freedom. Let us first focus on the orientation of the ligand (*Ω*), defined using the orientation quaternion formalism. We can restrain *Ω* during the distance-based US simulations using a biasing potential ((frac12kvarOmega ^2) where a *k* is harmonic force constant) and later correct the free-energy difference based on the PMF associated with the *Ω*, which is different in the bulk (*F*(**x**_{B}, *Ω*)) and in the binding pocket (*F*(**0**, *Ω*)). More generally, for any grid point **x**, we may determine Δ*G*(**x**) based on the PMF associated with the *Ω* at **x** (*F*(**x**, *Ω*)) and **0** (*F*(**0**, *Ω*)):

$$mathrme^ – fracDelta G(boldsymbolx)RT = frac{{int_!0^uppi {mathrme^ – fracF(boldsymbolx,varOmega )RTmathrmdvarOmega } }}{{int_!0^uppi {mathrme^ – fracF(boldsymbol0,varOmega )RTmathrmdvarOmega } }}$$

(12)

Note that *F*(**x**, *Ω*) is the PMF associated with **x** and *Ω*, defined such that:

$$Gmathrm(boldsymbolxmathrm) = c – RTln mathop intnolimits_!0^uppi {mathrme^{ – fracF(boldsymbolx,varOmega )RT}mathrmdvarOmega }$$

(13)

where *c* is an arbitrary constant. We therefore have:

$$mathrme^{ – frac{Delta G(boldsymbolx)}RT} = frac{{int_!0^uppi {mathrme^{ – frac{F(boldsymbolx,varOmega )}RT}mathrmdvarOmega } }}{{int_!0^uppi {mathrme^ – fracF(boldsymbolx,varOmega ) + frac12kvarOmega ^2RTmathrmdvarOmega } }} times frac{{int_!0^uppi {e^ – fracF(boldsymbol0,varOmega ) + frac12kvarOmega ^2RTmathrmdvarOmega } }}{{int_!0^uppi {e^{ – fracF(boldsymbol0,varOmega )RT}mathrmdvarOmega } }} times frac{{int_!0^uppi {mathrme^{ – fracF(boldsymbolx,varOmega ) + frac12kvarOmega ^2RT}mathrmdvarOmega } }}{{int_!0^uppi {mathrme^{ – frac{F(boldsymbol0,varOmega ) + frac12kvarOmega ^2}RT}mathrmdvarOmega } }}$$

(14)

We now define *G*_{Ω}(**x**) as the grid PMF of the restrained system (by *Ω*):

$$G_varOmega (boldsymbolx) = – RTln mathop intnolimits_!0^uppi {mathrme^{ – frac{F(boldsymbolx,varOmega ) + frac12kvarOmega ^2}RT}mathrmdvarOmega }$$

(15)

We also define *U*_{Ω}(**x**) as the average biasing potential at grid point **x**:

$$U_varOmega left( boldsymbolx right) = – RTln langle mathrme^ – fracfrac12kvarOmega ^2RT rangle _boldsymbolx = – RTln frac{{int_0^uppi {mathrme^{ – frac{{Fleft( boldsymbolx,varOmega right) + frac12kvarOmega ^2}}RT}} mathrmdvarOmega }}{{int_0^uppi {e^{ – frac{{Fleft( boldsymbolx,varOmega right)}}RT}} mathrmdvarOmega }}$$

(16)

Now we have from relations (14), (15) and (16):

$$Delta Gleft( boldsymbolx right) = Delta G_varOmega left( boldsymbolx right) – Delta U_varOmega left( boldsymbolx right)$$

(17)

where, the free energy of grid point **x** from the center **0** (Δ*G*(**x**)) is calculated based on its equivalent free energy (Δ*G*_{Ω}(**x**)) in a system biased by a harmonic restraint on *Ω* and a correction term Δ*U*_{Ω}(**x**). For **x** = **x**_{B}:

$$Delta U_varOmega left( boldsymbolx_mathrmB right) = – RTln frac{{langle mathrme^ – fracfrac12kvarOmega ^2RT rangle _mathrmbulk}}{{langle mathrme^ – fracfrac12kvarOmega ^2RT rangle _mathrmpocket}}$$

(18)

To determine the above ensemble averages, we need to determine the PMF along *Ω* for the bound and unbound ligand and calculate the ensemble averages analytically using relation (16). Δ*G*_{Ω}(**x**_{B}) can be determined from PMF calculations, where the distance between the protein and ligand is varied and the orientation of the ligand is restrained (distance-based BEUS with restrained orientation). We note that:

$$V_mathrmP = mathop intnolimits_mathrmpocket {mathrme^{ – frac{{Delta Gleft( boldsymbolx right)}}RT}} mathrmdV = mathop intnolimits_mathrmpocket {mathrme^{ – frac{Delta G_varOmega left( boldsymbolx right) – Delta U_varOmega left( boldsymbolx right)}RT}} mathrmdV approx mathop intnolimits_mathrmpocket {mathrme^{ – frac{Delta G_varOmega left( boldsymbolx right)}RT}} mathrmdV$$

(19)

where we assume Δ*U*_{Ω}(**x**) is negligible for **x** within the binding pocket. In other words, (langle {mathrme^ – fracfrac12kvarOmega ^2RT} rangle _boldsymbolx approx langle {mathrme^ – fracfrac12kvarOmega ^2RT} rangle _boldsymbol0) for **x** close to **0**.

In brief, if we choose to restrain the orientation, our absolute binding-free-energy estimate includes the following terms (using relations (8) and (17)):

$$Delta G^circ = – Delta G_varOmega left( boldsymbolx_mathrmB right) + Delta U_varOmega left( {boldsymbolx_mathrmB} right) + Delta G_mathrmV$$

(20)

*F*(**x**_{B}, *Ω*) can be calculated numerically from orientation angle distribution of a free ligand: (F(boldsymbolx_mathrmB,varOmega ) = – RTln pleft( varOmega right),) where *p*(*Ω*) is determined from the distribution of Euler angles ((pleft( phi ,theta ,psi right) = frac18uppi ^2sin theta), where 0 ≤ *ϕ*, *ψ* ≤ 2π and 0 ≤ *θ* ≤ π) given that:

$$cos fracvarOmega 2 = cos fracphi 2cos fractheta 2cos fracpsi 2 + sin fracphi 2sin fractheta 2sin fracpsi 2$$

(21)

(langle {mathrme^ – fracfrac12kvarOmega ^2RT} rangle _{mathrmbulk}) can then be calculated using relation (16) with numerically estimated *F*(**x**_{B}, *Ω*) and the *k* value used in the simulations. (Fleft( {boldsymbolx_mathrmB,varOmega } right) = – RTln p(varOmega )) was numerically estimated by discretizing each of the 3 Euler angles with a bin width of 1° and a total of 360 **×** 360 **×** 180 bins to estimate *p*(*Ω*) from *p*(*ϕ*, *θ*, *ψ*). *F*(**0**, *Ω*) can be determined approximately using orientation-based US simulations of bound ligand. *F*(**0**, *Ω*) can then be used to estimate (langle {mathrme^{ – fracfrac12kvarOmega ^2RT}} rangle _mathrmpocket) using relation (16).

The above strategy can be extended to other degrees of freedom for which unbiased sampling may hinder the convergence. Most notably, the internal conformational changes of the ligand and that of the protein may also play a crucial role in slowing down the convergence. In the following, we show how one can restrain not only the orientation of the ligand but also the RMSD of the ligand (denoted here by *r*) in distance-based US simulations (along *d*) to speed up convergence. In this case, the grid PMF difference Δ*G*(**x**) is calculated based on Δ*G*_{Ω,r}(**x**), the grid PMF of a system whose *Ω* and *r* are both restrained:

$$mathrme^{ – frac{{Delta Gleft( boldsymbolx right)}}RT} = frac{{int_!0^infty {int_0^uppi {mathrme^{ – frac{{Fleft( {boldsymbolx,varOmega ,r} right)}}RT}} } mathrmdvarOmega mathrmdr}}{{int_!0^infty {int_!0^uppi {mathrme^{ – fracFleft( boldsymbol0,varOmega ,r right)RT}} } mathrmdvarOmega mathrmdr}}$$

(22)

Using a similar strategy as in relation (14), we have:

$$beginarraylmathrme^ – fracDelta Gleft( x right)RT = frac{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( {boldsymbolx,Omega ,r} right)}}RT}mathrmdvarOmega mathrmdr}}{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( {boldsymbolx,varOmega ,r} right) + frac12k^prime r^2}}RT}mathrmdvarOmega mathrmdr}} times frac{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( {boldsymbolx,varOmega ,r} right) + frac12k^prime r^2}}RT}mathrmdvarOmega mathrmdr}}{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( {boldsymbolx,varOmega ,r} right) + frac12k^prime r^2 + frac12kvarOmega ^2}}RT}mathrmdvarOmega mathrmdr}}\ times frac{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^ – fracFleft( 0,varOmega ,r right) + frac12k^prime r^2 + frac12kvarOmega ^2RTmathrmdvarOmega mathrmdr}}{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – fracFleft( boldsymbol0,varOmega ,r right) + frac12k^prime r^2RT}mathrmdvarOmega mathrmdr}} times frac{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( boldsymbol0,varOmega ,r right) + frac12k^prime r^2}}RT}mathrmdvarOmega mathrmdr}}{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( boldsymbol0,varOmega ,r right)}}RT}mathrmdvarOmega mathrmdr}}\ times frac{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( {boldsymbolx,varOmega ,r} right) + frac12k^prime r^2 + frac12kvarOmega ^2}}RT}mathrmdvarOmega mathrmdr}}{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{Fleft( boldsymbol0,varOmega ,r right) + frac12k^prime r^2 + frac12kvarOmega ^2}RT}mathrmdvarOmega mathrmdr}}endarray$$

(23)

which results in:

$${rme^{ – fracDelta Gleft( bfx right)RT}} = frac{{{{langle {rme^ – fracfrac12k^prime r^2RT}rangle }_bf0}}}{{{{langle {rme^{ – fracfrac12k^prime r^2RT}}rangle }_bfx}}} times frac{{{{langle {rme^{ – fracfrac12kvarOmega ^2RT}}rangle }_}0^r}}{{langle {rme^{ – frac{{frac12kvarOmega ^2}}RT}}rangle _bfx^r}} times frac{{rme^ – beta G_varOmega ,r(bfx)}}{{rme^ – beta G_varOmega ,r(bf0)}}$$

(24)

Here we have defined *G*_{Ω,r}(**x**) as:

$$G_varOmega ,rleft( boldsymbolx right) = – RTln mathop intnolimits_0^infty {mathop intnolimits_0^uppi {mathrme^{ – frac{{Fleft( {boldsymbolx,varOmega ,r} right) + frac12k^prime r^2 + frac12kvarOmega ^2}}RT}} }$$

(25)

where *k*′ is the harmonic force constant associated with the *r* based on biasing potential ((frac12k^prime r^2)). We also define *U*_{r}(**x**) similar to *U*_{Ω}(**x**) in relation (15) except for using *r* instead of *Ω*. (U_varOmega^r(boldsymbolx)) is also defined similar to *U*_{Ω}(** x**) except for the additional restraint on

*r*:

$$U_varOmega ^rleft( x right) = – RTln langle {mathrme^{ – frac{{frac12kvarOmega ^2}}RT}} rangle _boldsymbolx^r = – RTln frac{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{Fleft( {boldsymbolx,varOmega ,r} right) + frac12k^prime r^2 + frac12kvarOmega ^2}}RT}mathrmdvarOmega mathrmdr}}{{mathop smallint nolimits_0^infty mathop smallint nolimits_0^uppi mathrme^{ – frac{{left( {Fleft( {boldsymbolx,varOmega ,r} right) + frac12k^prime r^2} right)}}RT}mathrmdvarOmega mathrmdr}}$$

(26)

Finally, we have:

$$Delta Gleft( boldsymbolx right) = Delta G_{varOmega ,r}left( {boldsymbolx} right) – Delta U_rleft( {{{boldsymbolx}}} right) – Delta U_{varOmega }^rleft( {{{boldsymbolx}}} right)$$

(27)

In brief, if we choose to restrain both the orientation and RMSD, our absolute binding-free-energy estimate includes the following terms:

$$Delta G^circ = – Delta G_varOmega ,rleft( {{{{boldsymbolx}}}_mathrmB} right) + Delta U_rleft( {{{{boldsymbolx}}}_mathrmB} right) + Delta U_varOmega ^rleft( {{{{boldsymbolx}}}_mathrmB} right) + Delta G_mathrmV$$

(28)

Here we are using an approximation similar to that in relation (19):

$$V_mathrmP approx mathop intnolimits_mathrmpocket {mathrme^{ – frac{{{{Delta }}G_{{{varOmega }},r}left( {{{boldsymbolx}}} right)}}{RT}}} mathrmdV$$

(29)

Using relations (20) and (28), we can generalize the stratification strategy to include three restraints on arbitrary collective variables *α*, *β* and *γ*:

$$Delta G^circ = – Delta G_alpha ,beta ,gamma left( {{{{boldsymbolx}}}_mathrmB} right) + Delta U_gamma left( {{{{boldsymbolx}}}_mathrmB} right) + Delta U_beta ^gamma left( {{{{boldsymbolx}}}_mathrmB} right) + Delta U_alpha ^beta ,gamma left( {{{{boldsymbolx}}}_mathrmB} right) + Delta G_mathrmV$$

(30)

where:

$$Delta G_mathrmV approx – RTln mathop intnolimits_{{mathrmpocket}} {mathrme^{ – frac{{Delta G_alpha ,beta ,gamma ({{{boldsymbolx}}})}}{{RT}}}} frac{mathrmdV}{{{mathrmAA}^3}} – Delta G_mathrmB$$

(31)

### Isothermal titration calorimetry of hFGF1 with heparin hexasaccharide

ITC data were obtained using MicroCal iTC 200 (Malvern) with microcal origin software. The change in heat during the biomolecular interaction was measured by titrating heparin (loaded in the syringe) into the hFGF1 solution in the calorimetric cell. Both the protein and the heparin samples were prepared in the buffer containing 10 mM phosphate buffer with 100 mM NaCl at pH 7.2 and were degassed before loading. The protein-to-heparin ratio was maintained at 1:10 with the protein concentration being 100 μM and the heparin concentration being 1 mM. A total of 30 injections were conducted with a constant temperature of 25 °C and stirring speed of 300 rpm. One set of sites binding model was used for the ITC binding curve^{68}. The standard binding free energy Δ*G*° was determined from dissociation constant via relation (2) at *T* = 25 °C. The experiment was repeated three times with the same sample and the results obtained were very similar to each other. The mean and standard deviation were reported for both *K*_{d} and Δ*G*°.

### All-atom MD simulations

For our bound state, we utilized the X-ray crystal structure of the dimeric hFGF1 combination with heparin hexasaccharide (PDB 2AXM; resolution, 3.0 Å)^{69}, and for our apo state, we used the X-ray crystal structure of monomeric hFGF1 (PDB 1RG8; resolution, 1.1 Å)^{70}. The NAMD 2.13 (ref. 71) was used to run MD simulations. Using a conjugate gradient, we energy-minimized the system for 10,000 steps. We next relaxed the systems using stepwise restrained MD simulations (for 1 ns) using CHARMM-GUI^{72}. All production runs were done in an NPT (constant N, number of atoms; P, pressure; T, temperature) ensemble after the first NVT (constant N, number of atoms; V, volume; T, temperature) relaxation. Simulations were done at 300 K with a 2 fs time step and a 0.5 ps^{−1} damping coefficient using a Langevin integrator. Nosé–Hoover–Langevin pistons were used to maintain 1 atm pressure^{72}. Long-range electrostatic interactions were estimated using the particle mesh Ewald approach. The initial runs were done for 15 ns, followed by the production run on the Anton 2 supercomputer (Pittsburgh Supercomputing Center) for 4.8 μs with a 2.5 fs time step.

### MD simulations of free heparin hexasaccharide

Heparin hexasaccharide^{69} was simulated in a rectangular water box without the protein. The system was set up as described previously in the ‘All-atom MD simulations’ section. The final conformation after relaxation was then used as the starting conformation for 10 production runs for 40 ns each. The total simulation time was around 400 ns.

### SMD simulations

The final conformations of the hFGF1–heparin^{73}, apo hFGF1 (ref. 73) and free heparin hexasaccharide equilibrium simulations were used to generate starting conformations for the non-equilibrium pulling simulations. Four collective variables^{74} were used for the SMD simulations^{75}: (1) distance between the heavy-atom center of mass of heparin and that of the protein (*d*); (2) the orientation angle of heparin with respect to the protein (*Ω*) defined using the orientation quanternion formalism; (3) RMSD of the protein (*r*_{P}); (4) RMSD of heparin (*r*_{L}). Six independent sets of simulations were performed. The distance-based SMD simulation was run for 9.5 ns, while the orientation-based SMD simulation was run for 8 ns. The distance-based SMD simulation was used to pull the heparin away from the protein by approximately 30 Å (10 Å **→** 40 Å) with a force constant of 100 kcal (mol Å^{2})^{−1}. The orientation angle was also restrained in this simulation with a force constant of 0.5 kcal (mol degree^{2})^{−1} to stay close to its initial orientation in the bound state. The orientation-based SMD simulation was used to rotate the bound heparin locally with respect to the protein (0° → 73°) with a force constant of 100 kcal (mol degree^{2})^{−1}. Four RMSD-based SMD simulations were run for 10 ns each using a force constant of 50 kcal (mol Å^{2})^{−1}: (1) to change the RMSD of the bound protein (0.5 Å→2 Å) (the RMSD of heparin was restrained in this simulation with a force constant of 1 kcal (mol Å^{2})^{−1}); (2) to change the RMSD of the bound heparin (1.5 Å → 4 Å); (3) to change the RMSD of the unbound protein (0.8 Å → 3.2 Å); (4) to change the RMSD of the free heparin (1.5 Å → 5.5 Å).

### BEUS simulations

BEUS^{53,76,77}, which is a variation of the US simulation method, was performed to estimate grid PMF (Supplementary Fig. 1). Four independent sets of distance (*d*)-based BEUS simulations were performed, with no restraints, restraint on orientation angle of heparin with respect to the protein (*Ω*), restraint on RMSD of the ligand (*r*_{L}) and RMSD of the protein (*r*_{P}), and restraints on *Ω*, *r*_{L} and *r*_{P}. Two sets of BEUS simulations were also performed using the *Ω* collective variable, one with and one without a restraint on *r*_{L} and *r*_{P}. In addition, two sets of BEUS simulations were performed using the *r*_{P} collective variable (bound protein with restraint on *r*_{L}; unbound protein) and two sets were performed using the *r*_{L} collective variable (bound ligand; free ligand). Selected SMD conformations were assigned to individual BEUS windows with equal spacing in each one of these BEUS simulations. The distance-based BEUS simulations ran for 10 ns with 31 replicas/windows and the orientation-based simulations ran for 10 ns with 30 replicas/windows. The RMSD-based BEUS simulations ran for 10 ns with 12 replicas/windows. The force constant used for ligand–protein distance (*d*) in distance-based BEUS was 2 kcal (mol Å^{2})^{−1} while the orientation was restrained as in SMD simulations using a force constant of 0.5 kcal (mol degree^{2})^{−1}. For orientation-based BEUS simulations, the force constant for the ligand orientation angle (as in SMD simulations) was set to 0.5 kcal (mol degree^{2})^{−1}. The force constant used for *r*_{L} and *r*_{P} in all cases was 1 kcal (mol Å^{2})^{−1}. See Supplementary Fig. 1 for a schematic representation of these simulations.

### Free-energy calculations using non-parametric reweighting

Once the BEUS simulations described above were converged, a non-parametric reweighting method^{76,78}, which is somewhat similar to the multi-state Bennett acceptance ratio method^{79}, was used to construct the PMF. In this method^{76}, each sampled configuration will be assigned a weight, which can be used to construct the PMF in terms of a desired collective variable. Suppose that a system is biased (for instance, within a BEUS scheme) using *N* different biasing potentials *U*_{i}(**r**), where *i* = 1, …, *N*, and **r** represents all atomic coordinates. Typically, *U*_{i}(**r**) is a harmonic potential defined in terms of a collective variable with varying centers for different *i*. Assuming an equal number of sampled configurations from each of the *N* generated trajectories, we can combine them in a single set of samples **r**_{k} (irrespective of which bias was used to generate each sample **r**_{k}) and determine the weight of each sample (*w*_{k}) as:

$$w_k = c/mathop sumlimits_i {mathrme^{ – frac{{left( {U_ileft( boldsymbolr_k right) – F_i} right)}}{{RT}}}}$$

(32)

where, *c* is the normalization constant such that (mathop sumnolimits_k w_k = 1) and both *w*_{k} and perturbed free energies *F*_{i} are determined iteratively using the above equation and the following:

$$mathrme^ – beta F_i = mathop sumlimits_k {w_kmathrme^{ – frac{{U_ileft( boldsymbolr_k right)}}{{RT}}}}$$

(33)

Converged *w*_{k} values can be used to construct any ensemble averages including any PMF (for example, *G*(**ζ**) PMF of the atomic system in the collective variable space(**ζ**))) in terms of not only the collective variable used for biasing but also any other collective variables that are sufficiently sampled. One may use a weighted histogram method to construct the PMF as follows:

$$Gleft( boldsymbolupzeta _i right) = – RTln mathop sumlimits_k w_k delta left( {boldsymbolupzeta left( {boldsymbolr_k} right) – boldsymbolupzeta _i} right),$$

(34)

$$delta left( {boldsymbolupzeta left( {{{boldsymbolr}}_k} right) – boldsymbolupzeta _i} right) = left{ {beginarray*20l 1, hfill & {left| {boldsymbolupzeta left( {{{{boldsymbolr}}}_k} right) – {{boldsymbolupzeta }}_i} right| < left| {{{{boldsymbolupzeta }}}left( {{{{boldsymbolr}}}_k} right) – {{{boldsymbolupzeta }}}_j} right|mathrmfor,j ne i.} hfill \ 0, hfill & mathrmotherwise hfill endarray} right.$$

(35)

To estimate the uncertainty of any of PMF calculations described above, one may use bootstrapping. Here, we have used a block Bayesian bootstrapping technique^{77}, where 100 alternative datasets are resampled from the existing dataset and the same non-parametric reweighting algorithm and the same PMF calculation is repeated for each set to generate 100 alternative PMFs. The standard deviation of the PMF at any point along the reaction coordinate provides an estimate for the error.

### Alchemical FEP simulations

We used the BFEE2 (ref. 27) package to estimate the absolute free energy of binding in silico for an alchemical or geometrical route with multiple subprocesses and geometric constraints. Alchemical FEP simulations were performed to calculate the absolute binding free energy for the interaction of hFGF1 with heparin hexasaccharide. We used a double annihilation protocol^{80}, wherein the heparin hexasaccharide is annihilated in both the free and bound states. The final conformations of the hFGF1–heparin complex^{73} and free heparin hexasaccharide equilibrium simulations (discussed previously in the ‘All-atom MD simulations’ section) were used to generate starting conformations for the bound hFGF1–heparin and free heparin FEP simulations respectively. For the alchemical route, four separate simulations are performed: (1) coupling the restraints of seven collective variables in the bound state; (2) decoupling the ligand alchemically in the bound state; (3) coupling the ligand alchemically in the unbound state; (4) decoupling the conformational restraints in the unbound state. The FEP simulations 1 and 3 were performed bidirectionally using 200 *λ*-windows (*λ* is the coupling parameter associated with the FEP that could vary between 0 and 1). Each *λ*-window included a 0.5 ns of equilibration and 5.0 ns of averaging for both the unbound and bound states, for a total of 2.3 μs (Supplementary Table 3). The decoupling FEP simulations 2 and 4 were also performed bidirectionally, each one for 51 ns. All FEP simulations were performed using the NAMD 2.13 (ref. 71) simulation package with the CHARMM36m all-atom additive force field, using the protocol discussed previously for the equilibrium simulations. We used the state-of-the-art BFEE2 (ref. 27) method to make input files and analyze the FEP simulations.

### Binding-free-energy calculations using geometrical route

The extended ABF technique with an umbrella integration estimator was used to calculate the free-energy change along the coarse variables required to characterize reversible heparin–hFGF1 binding^{3,24,27}. We used the software BFEE2 (ref. 27) to generate the input files for these simulations. In the geometrical route, these collective variables are often subjected to restrictions, and the amount of reversible work required to impose each constraint is determined by a sequence of very accurate PMF simulations. The collective variables used here are the RMSDs of the two proteins’ backbone distances from the reference, native conformation, the three Euler angles (*Θ*, *Φ* and *Ψ*) that describe their relative orientation and the polar (*θ*) and azimuth angles (*φ*) that describe their relative position^{27,81}. The geometrical path consists of a sequence of separate PMF computations performed sequentially with the gradual inclusion of restrictions (RMSD, *Θ*, *Φ*, *Ψ*, *θ* and *φ*), as shown in Supplementary Table 2. Each geometric collective variable (RMSD, Θ, *Φ*, Ψ, *θ*, *φ* and *r* = (1/*β*) ln(*S***I***C*°); *β* = (𝑘_{B}*T*)^{−1}, with *k*_{B} the Boltzmann constant and *T* the temperature; *C*° denotes the standard concentration of 1 M. *I**, which stands for the separation term, and *S**, which stands for the surface term, indicate the percentage of a sphere with radius *r**, centered at the binding site of the reference protein, that is, accessible to its partner) simulation was run with 10 replicas per restriction, and each replica simulation included 20 ns (RMSD, *Θ*, *Φ*, *Ψ*, *θ* and *φ*) of simulation time (*r* collective variables simulations were run for 40 ns for each replica), for a total of 1.6 μs. The BFEE2 (ref. 27) Gui was used to analyze the final ABF simulation data.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.