Metacello Dependency Version Best Practice

S
sean@clipperadams.com
Sun, Aug 4, 2024 3:22 AM

I recently experienced some CI problems for several of my projects because of an (IMO unnecessarily tight) version specification of XML-XPATH’s dependency on XMLParser: `github://pharo-contributions/XML-XMLParser:v3.6.x/src`. At minimum, it seems the dependency should be on the major version (e.g. v3), not the minor (unless I’m missing a needed feature that was added in 3.6).

More importantly though, it raises questions about what the best practice is here. In summary, it seems to me that in general it’s better to only pin versions in reaction to CI failures because pinning has significant downsides and uncertainties. Defensive pinning smells like premature optimization IMO.

In more detail, partially pinning (to major or minor version) by default doesn't seem right to me for several reasons unless we're talking about a tagged release that should be reproducible because:

  • We have limited community resources to manage these pinnings, which create cascading conflict problems with all other dependent projects.

  • Pinnings seem somewhat arbitrary because often community maintainers are not familiar enough with both projects or have enough time to do a thorough investigation. How does the maintainer know whether a particular dependency version really “works” with the project? Unless they are intimately familiar with both projects (and even then I wouldn’t have confidence in a manual review), the best way I  can think of is to rely on passing CI, and in that case…

  • It seems easier to just react to CI failures; defensive pinning smells like premature optimization IMO

  • Partial pinning (minor or major version) will not lead to reproducible builds because the patch is floating

  • What does reproducibility even mean from an untagged baseline?

Looking through a bunch of repos, it doesn’t seem that there is consensus either way. Some specify baselines and some specific versions.

p.s. for tracking the major, I like ba-st’s naming convention of v{integer} instead of adding “.x”’s of unclear value

I recently experienced some CI problems for several of my projects because of an (IMO unnecessarily tight) version specification of XML-XPATH’s dependency on XMLParser: \`github://pharo-contributions/XML-XMLParser:v3.6.x/src\`. At minimum, it seems the dependency should be on the major version (e.g. v3), not the minor (unless I’m missing a needed feature that was added in 3.6). More importantly though, it raises questions about what the best practice is here. In summary, it seems to me that in general it’s better to only pin versions in reaction to CI failures because pinning has significant downsides and uncertainties. Defensive pinning smells like premature optimization IMO. In more detail, partially pinning (to major or minor version) *by default* doesn't seem right to me for several reasons unless we're talking about a tagged release that should be reproducible because: * We have limited community resources to manage these pinnings, which create cascading conflict problems with all other dependent projects. * Pinnings seem somewhat arbitrary because often community maintainers are not familiar enough with both projects or have enough time to do a thorough investigation. How does the maintainer know whether a particular dependency version really “works” with the project? Unless they are intimately familiar with both projects (and even then I wouldn’t have confidence in a manual review), the best way I can think of is to rely on passing CI, and in that case… * It seems easier to just react to CI failures; defensive pinning smells like premature optimization IMO * Partial pinning (minor or major version) will not lead to reproducible builds because the patch is floating * What does reproducibility even mean from an untagged baseline? Looking through a bunch of repos, it doesn’t seem that there is consensus either way. Some specify baselines and some specific versions. p.s. for tracking the major, I like ba-st’s naming convention of v{integer} instead of adding “.x”’s of unclear value
S
sean@clipperadams.com
Tue, Sep 3, 2024 12:58 AM

Anyone? It would be good to have some consensus, at least for community-contribution projects…

Anyone? It would be good to have some consensus, at least for community-contribution projects…
GP
Guillermo Polito
Tue, Sep 3, 2024 7:47 AM

Hi Sean,

This is my take on this issue, do not take it as ground truth is far from that :)

TL;DR; forcing everybody to use major versions is not a solution: why do we have minor versions if we can only use major versions?

El 4 ago 2024, a las 5:22 a. m., sean@clipperadams.com escribió:

I recently experienced some CI problems for several of my projects because of an (IMO unnecessarily tight) version specification of XML-XPATH’s dependency on XMLParser: github://pharo-contributions/XML-XMLParser:v3.6.x/src. At minimum, it seems the dependency should be on the major version (e.g. v3), not the minor (unless I’m missing a needed feature that was added in 3.6).

I’ll start with a fact here. The developer chose a dependency 3.6.
Two options here: it could have been a good justified choice or not.

  1. if justified, then there is no way to use the major version. The thing is we don’t have proper support to manage version conflicts

Let’s say we have projects A, B, C, and D forming a diamond.

.----depends on v1 of ---> B  ----depends on v1.x of —.

/                                                                                     
A                                                                                        -> D
\                                                                                      /
.----depends on v1 of ---> C  ----depends on v1.1 of —'

From a semantic versioning point of view, 1.1 is compatible with 1.x
So a good version resolution should load D v1.1 when loading A, right?

However, if you do this right now with the current state of Metacello, dependency management is delegated to git or the user.
Git will know nothing about semantic versioning and tell that 1.1 and 1.x are different refs, and there is a conflict.

=> My conclusion here is that semantic versioning is a tool

  • we need good semantic version resolution if we embrace it
  • if we forbid people to use semantic versioning, let’s stop using semantic versioning at all

An even worst example. What if B and C depend on incompatible versions of D? How would you resolve that issue?

.----depends on v1 of ---> B  ----depends on v2.x of —.

/                                                                                     
A                                                                                        -> D
\                                                                                      /
.----depends on v1 of ---> C  ----depends on v1.1 of —‘

Here it’s easy to blame either the developer of C (for not upgrading) or A (for getting a wrong configuration, but maybe it was his only choice!).
Maybe why not B’s developer, because of “hasty upgrading”?
Thing is,

  • maybe C’s maintainer is not there anymore,
  • B was developed when D v2.x was available so people did not even think of developing it against older versions
  • A did what he could with what was available…

=> My conclusion here is that life is complicated :)

  • besides good tooling and good criteria, complex cases will arise, what do we propose as a resolution?
  1. Let’s say that the developer’s choice was not justified.
    Then why not engage with the developer to understand the reason for the decision?
    Can the dependency be fixed without breaking anything?
    There is no magic :)

More importantly though, it raises questions about what the best practice is here. In summary, it seems to me that in general it’s better to only pin versions in reaction to CI failures because pinning has significant downsides and uncertainties. Defensive pinning smells like premature optimization IMO.

Why are you assuming somebody did a “defensive pinning”?
I mean, maybe it’s the case, but why are you assuming this was not justified?

In more detail, partially pinning (to major or minor version) by default doesn't seem right to me for several reasons unless we're talking about a tagged release that should be reproducible because:

In general, I agree with the rule :)

But what if this was a conscious decision and not just a default thing?

I understand this does not suit your current needs, but I see a lot of implicit assumptions in my interpretation of your view.

We have limited community resources to manage these pinnings, which create cascading conflict problems with all other dependent projects.

True

Pinnings seem somewhat arbitrary because often community maintainers are not familiar enough with both projects or have enough time to do a thorough investigation. How does the maintainer know whether a particular dependency version really “works” with the project?

You test?

Unless they are intimately familiar with both projects (and even then I wouldn’t have confidence in a manual review), the best way I can think of is to rely on passing CI, and in that case…

I don’t see what is the solution you propose here...

It seems easier to just react to CI failures; defensive pinning smells like premature optimization IMO

What do you mean to react to CI failures?

Partial pinning (minor or major version) will not lead to reproducible builds because the patch is floating

True

What does reproducibility even mean from an untagged baseline?

True

Looking through a bunch of repos, it doesn’t seem that there is consensus either way. Some specify baselines and some specific versions.

Yes. The thing is also that Metacello is both at the same time used to

  • load development versions of projects with a more floating schema...
  • try to load a working “fixed” version

p.s. for tracking the major, I like ba-st’s naming convention of v{integer} instead of adding “.x”’s of unclear value

This is aesthetics only, right? Anyways, I’d like to have a more “standardised” way :)

Hi Sean, This is my take on this issue, do not take it as ground truth is far from that :) TL;DR; forcing everybody to use major versions is not a solution: why do we have minor versions if we can only use major versions? > El 4 ago 2024, a las 5:22 a. m., sean@clipperadams.com escribió: > > I recently experienced some CI problems for several of my projects because of an (IMO unnecessarily tight) version specification of XML-XPATH’s dependency on XMLParser: `github://pharo-contributions/XML-XMLParser:v3.6.x/src`. At minimum, it seems the dependency should be on the major version (e.g. v3), not the minor (unless I’m missing a needed feature that was added in 3.6). > I’ll start with a fact here. The developer chose a dependency 3.6. Two options here: it could have been a good justified choice or not. 1) if justified, then there is no way to use the major version. The thing is we don’t have proper support to manage version conflicts Let’s say we have projects A, B, C, and D forming a diamond. .----depends on v1 of ---> B ----depends on v1.x of —. / \ A -> D \ / .----depends on v1 of ---> C ----depends on v1.1 of —' From a semantic versioning point of view, 1.1 is compatible with 1.x So a good version resolution should load D v1.1 when loading A, right? However, if you do this right now with the current state of Metacello, dependency management is delegated to git or the user. Git will know nothing about semantic versioning and tell that 1.1 and 1.x are different refs, and there is a conflict. => My conclusion here is that semantic versioning is a tool - we need good semantic version resolution if we embrace it - if we forbid people to use semantic versioning, let’s stop using semantic versioning at all An even worst example. What if B and C depend on incompatible versions of D? How would you resolve that issue? .----depends on v1 of ---> B ----depends on v2.x of —. / \ A -> D \ / .----depends on v1 of ---> C ----depends on v1.1 of —‘ Here it’s easy to blame either the developer of C (for not upgrading) or A (for getting a wrong configuration, but maybe it was his only choice!). Maybe why not B’s developer, because of “hasty upgrading”? Thing is, - maybe C’s maintainer is not there anymore, - B was developed when D v2.x was available so people did not even think of developing it against older versions - A did what he could with what was available… => My conclusion here is that life is complicated :) - besides good tooling and good criteria, complex cases will arise, what do we propose as a resolution? 2) Let’s say that the developer’s choice was not justified. Then why not engage with the developer to understand the reason for the decision? Can the dependency be fixed without breaking anything? There is no magic :) > More importantly though, it raises questions about what the best practice is here. In summary, it seems to me that in general it’s better to only pin versions in reaction to CI failures because pinning has significant downsides and uncertainties. Defensive pinning smells like premature optimization IMO. > Why are you assuming somebody did a “defensive pinning”? I mean, maybe it’s the case, but why are you assuming this was not justified? > In more detail, partially pinning (to major or minor version) by default doesn't seem right to me for several reasons unless we're talking about a tagged release that should be reproducible because: > In general, I agree with the rule :) But what if this was a conscious decision and not just a default thing? I understand this does not suit your current needs, but I see a lot of implicit assumptions in my interpretation of your view. > We have limited community resources to manage these pinnings, which create cascading conflict problems with all other dependent projects. > True > Pinnings seem somewhat arbitrary because often community maintainers are not familiar enough with both projects or have enough time to do a thorough investigation. How does the maintainer know whether a particular dependency version really “works” with the project? > You test? > Unless they are intimately familiar with both projects (and even then I wouldn’t have confidence in a manual review), the best way I can think of is to rely on passing CI, and in that case… > I don’t see what is the solution you propose here... > It seems easier to just react to CI failures; defensive pinning smells like premature optimization IMO > What do you mean to react to CI failures? > Partial pinning (minor or major version) will not lead to reproducible builds because the patch is floating > True > What does reproducibility even mean from an untagged baseline? > True > Looking through a bunch of repos, it doesn’t seem that there is consensus either way. Some specify baselines and some specific versions. > Yes. The thing is also that Metacello is both at the same time used to - load development versions of projects with a more floating schema... - try to load a working “fixed” version > p.s. for tracking the major, I like ba-st’s naming convention of v{integer} instead of adding “.x”’s of unclear value > This is aesthetics only, right? Anyways, I’d like to have a more “standardised” way :)
S
sean@clipperadams.com
Thu, Sep 5, 2024 1:55 AM

Thank you for the discussion. I am learning!

Replies inline…

Guillermo Polito wrote:

forcing everybody to use major versions is not a solution: why do we have minor versions if we can only use major versions?

Unless they are intimately familiar with both projects (and even then I wouldn’t have confidence in a manual review), the best way I can think of is to rely on passing CI, and in that case…

I don’t see what is the solution you propose here...

I agree! My gut is that the best default, especially given the limitations you describe in the tooling at this time, is to depend on baselines, not specific versions, unless one has an important reason not to (e.g. for a tagged release which ideally should be 100% reproducible), which in my experience is often not the case. I sense that we often reflexively specify versions because that is “the semantic versioning” way, even though our tooling does not really enable us to easily gain the benefits usually associated with semver. I only suggested that major versions would require a bit less cascading changes than minor and patch pinning, which there seems to be a lot of without an expressed justification.

Why are you assuming somebody did a “defensive pinning”?

I often see commits like “update to lates Xyz project version” and the commit changes v1.2.3 to v1.5.3. I find it difficult to believe, especially given the lack of commit message details to justify, that in all these cases the main project absolutely can’t work without the 1, 2 and 3 patches to 1.5. I feel it’s more likely a symptom of exactly what I’m pointing out and which you illustrated in your examples - unless different projects all point to the exact same version, Metacello will have problems, so just specify full versions everywhere.

It seems easier to just react to CI failures; defensive pinning smells like premature optimization IMO

What do you mean to react to CI failures?

I mean that CI failures might be a useful guide to when we really need to pin versions based on clear evidence.

Thank you for the discussion. I am learning! Replies inline… Guillermo Polito wrote: > forcing everybody to use major versions is not a solution: why do we have minor versions if we can only use major versions? > > > Unless they are intimately familiar with both projects (and even then I wouldn’t have confidence in a manual review), the best way I can think of is to rely on passing CI, and in that case… > > I don’t see what is the solution you propose here... I agree! My gut is that the best default, especially given the limitations you describe in the tooling at this time, is to depend on baselines, not specific versions, unless one has an important reason not to (e.g. for a tagged release which ideally should be 100% reproducible), which in my experience is often not the case. I sense that we often reflexively specify versions because that is “the semantic versioning” way, even though our tooling does not really enable us to easily gain the benefits usually associated with semver. I only suggested that major versions would require a bit less cascading changes than minor and patch pinning, which there seems to be a lot of without an expressed justification. > Why are you assuming somebody did a “defensive pinning”? I often see commits like “update to lates Xyz project version” and the commit changes v1.2.3 to v1.5.3. I find it difficult to believe, especially given the lack of commit message details to justify, that in all these cases the main project absolutely can’t work without the 1, 2 and 3 patches to 1.5. I feel it’s more likely a symptom of exactly what I’m pointing out and which you illustrated in your examples - unless different projects all point to the exact same version, Metacello will have problems, so just specify full versions everywhere. > > It seems easier to just react to CI failures; defensive pinning smells like premature optimization IMO > > What do you mean to react to CI failures? I mean that CI failures might be a useful guide to when we really need to pin versions based on clear evidence.