Sunday, July 6, 2025

How Soon Might Humans Be Replaced At Work

As noted by Thomas Claburn in The Register, there seems to be a contradiction between two pieces of research relating to the development and use of AI in business organizations.

On the one hand, teams of researchers have developed benchmarks to study the effectiveness of AI, and have found success rates between 25% and 40%, depending on the situation.

On the other hand, Gartner reports that business executives are expecting a success rate nearer to 60% - if we interpret not-being-cancelled as a marker for success. More than 40 percent of agentic AI projects will be cancelled by the end of 2027 due to rising costs, unclear business value, or insufficient risk controls.

 

History tells us that the adoption of technology to perform work is only partially dependent on the quality of the work, and can often be driven more by cost. The original Luddites protested at the adoption of machines to replace textile workers, but their argument was largely based on the inferior quality of the textiles produced by the machines. It was only later that this label was attached to anyone who resisted technology on principle.

Around ten years ago, I attended a debate on artificial intelligence sponsored by the Chartered Institute of Patent Agents. In my commentary on this debate (How Soon Might Humans Be Replaced At Work?) I noted that decision-makers may easily be tempted by short-term cost savings from automation, even if the poor quality of the work results in higher costs and risks in the longer term.

In their look at the labour market potential of AI, Tyna Eloundou et al note that

A key determinant of their utility is the level of confidence humans place in them and how humans adapt their habits. For instance, in the legal profession, the models’ usefulness depends on whether legal professionals can trust model outputs without verifying original documents or conducting independent research. ... Consequently, a comprehensive understanding of the adoption and use of LLMs by workers and firms requires a more in-depth exploration of these intricacies.

However, while levels of confidence and trust can be assessed by surveying people's opinions, such surveys cannot assess whether these levels of confidence and trust are justified. Graham Neubig told The Register that this was what prompted the development of a more objective benchmark for AI effectiveness.


Thomas Claburn, AI agents get office tasks wrong around 70% of the time, and a lot of them aren't AI at all (The Register, 29 June 2025)

Tyna Eloundou, Sam Manning, Pamela Mishkin and Daniel Rock, GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models (August 2023)

Wikipedia: Luddite 

Related Posts: How Soon Might Humans Be Replaced At Work? November 2015, RPA - Real Value or Painful Experimentation? (August 2019)

Thursday, March 20, 2025

Machine Indoctrination

March 2013 

From my post on Enabling Prejudices

One of the key insights of the early work on Design Thinking (Bryan Lawson, Peter Rowe) was the importance of heuristics, or what Rowe (following Gadamer) calls enabling prejudices, which will hopefully get us to a good-enough solution more quickly. 

As Christopher Alexander notes:

At the moment when a person is faced with an act of design, he does not have time to think about it from scratch. The Timeless Way of Building p 204

We always approach a problem with a set of prejudices or prejudgements. Depending on the situation, these may either help us to solve the problem more quickly (enabling), or may lead us astray (disabling). The acid test of a set of heuristics or design principles is that they are mostly enabling most of the time.


March 2025

Design schools indoctrinate their students in ways of solving design problems efficiently. If you are trained to follow a particular design approach - for example Bauhaus - this greatly reduces the complexity of the task, because it rules out a vast number of solutions that would be anathema to a Bauhausian.

Algorithms may be able to calculate vastly larger sets of options than a human designer, but as Adam Nocek explained yesterday in a talk at Goldsmiths, machine intelligence is subject to mathematical limitations on computability. I touched on this topic in my guest editorial for a journal special issue on Algorithms in 2023, but his argument was much more comprehensive and wide-ranging, linking to important questions of agency and subjectivity.

Many researchers have noted the prevalence of algorithmic bias, but if we accept the importance of heuristics and intuition in the design process, there are much more fundamental problems here. 

 

to be continued ...

 


Christopher Alexander, The Timeless Way of Building (New York: Oxford University Press, 1979)

Dan Klyn, Skirmishing With Ill-Defined and Wicked Problems (TUG, 5 July 2013) - review of Rowe

Bryan Lawson, How Designers Think (1980, 4th edition 2005)

Adam Nocek, Designing in the age of artificial machines and Whitehead (Talk at Goldsmiths University, 19 March 2025)

Peter Rowe, Design Thinking (MIT Press 1987)

Richard Veryard, As we may think now (Subjectivity 2023) 


Related posts: From Sedimented Principles to Enabling Prejudices (March 2013), From Enabling Prejudices to Sedimented Principles (March 2013), Limitations of Machine Learning (July 2020)

 

Sunday, November 3, 2024

Influencing the Habermas Machine

In my previous post Towards the Habermas Machine, I talked about a large language model (LLM) developed by Google DeepMind for generating a consensus position from a collection of individual views, named after Jürgen Habermas.

Given that democratic deliberation relies on knowledge of various kinds, followers of Habermas might be interested in how knowledge is injected into discourse. Habermas argued that mutual understanding was dependent upon a background stock of cultural knowledge that is always already familiar to agents. but this clearly has to be supplemented by knowledge about the matter in question.

For example, we might expect a discussion about appropriate speed limits to be informed by reliable or unreliable beliefs about the effects of a given speed limit on journey times, accident rates, pollution, and so on. In traditional discussion forums, it is extremely common for people to present themselves as having some special knowledge or authority, which supposedly gives extra weight to their opinions, and we might expect something similar to happen in a tech-enabled version.

For many years, the Internet has been distorted by Search Engine Optimization (SEO), which means that the results of an internet search are largely driven by commercial interests of various kinds. Researchers have recently raised a similar issue in relation to large language models, namely Generative Engine Optimization (GEO). Meanwhile, other researchers have found that LLMs (like many humans) are more impressed by superficial jargon than by proper research.

So we might reasonably assume that various commercial interests (car manufacturers, insurers, oil companies, etc) will be looking for ways to influence the outputs of the Habermas Machine on the speed limit question by overloading the Internet with knowledge (regime of truth) in the appropriate format. Meanwhile the background stock of cultural knowledge is now presumably co-extensive with the entire Internet.

Is there anything that the Habermas Machine can do to manage the quality of the knowledge used in its deliberations?


Footnote: Followers of Habermas can't agree on the encyclopedia entry, so there are two rival versions.

Footnote: The relationship between knowledge and discourse goes much wider than Habermas, so interest in this question is certainly not limited to his followers. I might need to write a separate post about the Foucault Machine.


Pranjal Aggarwal et al, GEO: Generative Engine Optimization (arxiv v3, 28 June 2024)

Callum Bains, The chatbot optimisation game: can we trust AI web searches? (Observer, 3 November 2024)

Alexander Wan, Eric Wallace, Dan Klein, What Evidence Do Language Models Find Convincing? (arxiv v2, 9 August 2024)

Stanford Encyclopedia of Philosophy: Jürgen Habermas (v1 2007) Jürgen Habermas (v2 2023)

Saturday, October 19, 2024

Towards the Habermas Machine

Google DeepMind has just announced a large language model, which claims to generate a consensus position from a collection of individual views. The name of the model is a reference to Jürgen Habermas’s theory of communicative action.

An internet search for Habermas machine throws up two previous initiatives under the same name. Firstly an art project by Kristopher Holland.

The Habermas Machine (2006–2012) is a conceptual art experience that both examines and promotes an experiential relation to Jürgen Habermas’ grand theory for understanding human interaction. The central claim is that The Theory of Communicative Action can be experienced, reflected upon and practised when encountered within arts-based research. Habermas’ description of how our everyday lives are founded by intersubjective experience, and caught up in certain normative, objective and subjective contexts is transformed through the method of conceptual art into a process of collaborative designing, enacting and articulating. This artistic reframing makes it possible to experience the communicative structure of knowledge and the ontological structure of intersubjectivity in a practice of non-discursive ‘philosophy without text’. Feiten Holland Chemero

And secondly, an approach to Dialogue Mapping described as a device that all participants can climb into and converse with complete communicative rationality, contained in a book by @paulculmsee and Jailash Awati, and mentioned in this Reddit post Why is Dialogue Mapping not wide spread? Dialogue Mapping was developed by Jeff Conklin and others as an approach to addressing wicked problems. See also Issue Based Information Systems (IBIS).


Update

Christopher Summerfield, one of the authors of the DeepMind paper, spoke at the Royal Society on October 29th 2024. https://www.youtube.com/live/cW1Wq7_8v1Y?si=oqo8Lw7479x4QqKt&t=18890

All the examples shown in his talk were policy matters that could be reduced to Yes/No questions. Such questions would traditionally be surveyed by asking people to place themselves on a scale from Strongly Agree to Strongly Disagree, and it is easy to see how a language-based method such as the Habermas Machine offers some advantages over a numerical scale. But not clear how this works for more provocative questions, let alone wicked problems.

Someone in the audience asked if this method would work in what he called a compromised democracy, and Summerfield acknowledged that the method assumes what he called a good faith scaffold. Obviously all democracies in the real world are imperfect, and he didn't go into the question as to how sensitive or vulnerable the method might be to such imperfections, but the method might conceivably help to overcome some of these imperfections under some conditions: for example, Summerfield referred specifically to the tyranny of the majority.

While the performance of the Habermas machine in their study compared favourably with the performance of human mediators, Summerfield suggested that we should move away from thinking about AI in these terms. The point is not to create AI-based agents that can behave like intelligent people but to build intelligent institutions - tools for creating social order and fostering cooperation. As my regular readers will know, orgintelligence has long been an important theme for this blog. See for example my post On Organizations and Machines (January 2022).


Jeffrey Conklin, Dialogue Mapping: Building Shared Understanding of Wicked Problems (Wiley 2006). See also CogNexus website.

Paul Culmsee and Jailash Awati, The Heretic's Guide to Best Practice (2013)

Nicola Davis, AI mediation tool may help reduce culture war rifts, say researchers (Guardian, 17 October 2024)

Tim Elmo Feiten, Kristopher Holland and Anthony Chemero, Doing philosophy with a water-lance: art and the future of embodied cognition (Adaptive Behavior 2021) 

Michael Tessler et al, AI can help humans find common ground in democratic deliberation (Science, 18 October 2024)

Beyond the symbols vs signals debate (The Royal Society, 28-29 October 2024)

Wikipedia: Issue Based Information Systems (IBIS), Wicked Problem

See also Influencing the Habermas Machine (November 2024)

Thursday, June 6, 2024

All our eyes on the disgraceful Horizon

The scandal at the British Post Office, details of which are now emerging in the Public Enquiry, provides illustrations of many important aspects of organizational behaviour as discussed on this blog. 

Willful blindness. There is a strong attachment to a false theory, despite mounting evidence to the contrary, as well as the appalling human consequences.

Misplaced trust. Trusting a computer system (Horizon) above hundreds of ordinary people. And both the legal system and government ministers trusting the evidence presented by a public corporation, despite the fact that contrary evidence from expert witnesses had been accepted in a small number of cases (see below).

Defensive denial as one of the symptoms of organizational stupidity. In July 2013, Post Office boss Paula Vennells was told about faults in the Horizon system, and advised that denying these would be dangerous and stupid. This is something the Post Office had denied for years. ITV March 2024

A detail that struck me yesterday was a failure to connect the dots. In 2011, the auditors (EY) raised concerns about data quality, warning that if Horizon was not accurate, then they would not be able to sign off Post Office company accounts. Ms Perkins, who was giving evidence at an inquiry into the scandal, said at the time she did not make a link between the two. BBC June 2024. The pattern I'm seeing here is of assuming the sole purpose of audit as satisfying some regulatory requirement, with zero operational (let alone ethical) implications of anything the auditors might find. And assuming the regulatory requirement itself to have no real purpose, being merely a stupid and meaningless piece of bureaucracy.

Another failure to connect the dots occurred after Julie Wolstenholme successfully challenged the Post Office in 2003 with the aid of an expert technical witness. Why didn't this prompt serious questions about all the other cases? When asked about this at the enquiry, David Mills said he had not properly assimilated the information and pleaded lack of intelligence, saying I wasn’t that clever. I’m sorry, I didn’t ask about it. ITV April 2024

In my other pieces about organizational intelligence, I have noted that stupid organizations may sometimes be composed of highly intelligent people. Now that's one pattern the Post Office doesn't seem to illustrate. Or have the Post Office bosses merely chosen to present themselves as naive and incompetent rather than evil?


Tom Espiner, Ex-Post Office chair was told of IT risks in 2011 (BBC 5 June 2024)

ITV, Secret tape shows Paula Vennells was told about problems with Horizon and warned not to cover it up (29 March 2024)

ITV, Former Post Office boss tells inquiry he was not 'clever' enough to question Horizon IT system (16 April 2024)

Other Sources: Post Office Horizon IT Enquiry, British Post Office scandal (Wikipedia), Post Office Project (University of Exeter)

Friday, May 31, 2024

Thinking Academically

At Goldsmiths University yesterday for a discussion on Paratactical Life with Erin Manning and Brian Massumi. Academic jobs at Goldsmiths are currently threatened by a so-called Transformation Programme, similar to management initiatives at many other universities, giving critical urgency for those in the room to consider the primary task of the university in society, and the double task of the academic. For which Erin Manning advocates what she calls strategic duplicity.

This involves recognizing what works in the systems we work against. Which means: We don't just oppose them head on. We work with them, strategically, while nurturing an alien logic that moves in very different directions. One of the things we know that the university does well is that it attracts really interesting people. The university can facilitate meetings that can change lives. But systemically, it fails. And the systemic failure is getting more and more acute. Todoroff

One of the domains in which this duplicity is apparent is thinking itself. And this word thinking appears to have special resonance and meaning for academics - what academia calls thinking is not quite the same as what business calls thinking (which was the focus of my practitioner book on Organizational Intelligence) and certainly not the same as what tech calls thinking (the focus of Adrian Daub's book).

One of the observations that led to my work on Organizational Intelligence was the disconnect between the intelligence of the members of an organization and the intelligence of the organization itself. Universities are great examples of this, packed with clever people and yet the organization itself manifests multiple forms of stupidity. As of course do many other kinds of organization. I still believe that it is a worthwhile if often frustrating exercise to try to improve how a given organization collectively makes sense of and anticipates the demands placed on it by its customers and other stakeholders - in other words, how it thinks. However any such improvements would be almost entirely at the micropolitical level, I don't have much idea how one would go about dismantling what Deleuze calls the economy of stupidity.

Although I think the concept of organizational intelligence is a reasonable one, and have defended it here against those who argue that organizational functions and dysfunctions can always be reduced to the behaviours and intentions of individual human actors, I don't imagine that an organization will ever think in quite the way a person thinks. There are some deficiencies in organizational thinking, just as there are deficiencies in algorithmic thinking. For example, there are some interesting issues in relation to temporality, raised in some of the contributions to Subjectivity's Special Issue on Algorithms which I guest-edited last year.

For Brian Massumi, the key question is what is thinking for. In an academic context, we might imagine the answer to be something to do with knowledge - universities being where knowledge is created and curated, and where students are supposed to acquire socioeconomic advantage based on their demonstrated mastery of selected portions of this knowledge. Therefore much of the work of an academic is taken up with a form of thinking known as judgment or sorting out - deciding, agreeing and explaining the criteria by which students will be evaluated, using these criteria to assess the work of each student, and helping those students who don't fit the expected pattern for whatever reason.

But what really gives a student any benefit in the job market as a result of their studies is not just a piece of paper but a sense of their potential - for both thinking and doing. The problem with students using chatbots to write their assignments is not that they are cheating - after all, the ability to cheat without being found out is highly valued in many organizations, if not essential. The real problem is if they are learning a deficient form of thinking.

(This is far from a complete report on the afternoon, merely picking out some elements of the discussion that resonated with me.)

 

Update: Comments have been added to the goodreads version of this post.


Philip Boxer, The Three Asymmetries necessary to describing agency in living biological systems (Asymmetric Leadership, November 2023)

Philip Boxer, The Doubling of the Double Task (Asymmetric Leadership, February 2024)

Adrian Daub, What Tech Calls Thinking (Farrar Straus and Giroux, 2020)

Benoît Dillet, What Is Called Thinking?: When Deleuze Walks along Heideggerian Paths (Deleuze Studies 7/2 2013)

Kenan Malik, The affluent can have their souls enriched at university, so why not the poor as well? (Observer, 2 June 2024)

Brent Dean Robbins, Joyful Thinking-Thanking: A Reading of Heidegger’s “What is Called Thinking?” (Janus Head 13/2, October 2014) 

Uriah Marc Todoroff, A Cryptoeconomy of Affect (New Inquiry, May 2018)

Richard Veryard, Building Organizational Intelligence (Leanpub, 2012)

Richard Veryard, As we may think now (Subjectivity December 2023)

Related posts: Symptoms of Organizational Stupidity (May 2010), On Organizations and Machines (January 2022), Reasoning with the majority - chatGPT (January 2023), Creativity and Recursivity (September 2023)

Saturday, February 24, 2024

Anticipating Effects

There has been much criticism of the bias and distortion embedded in many of our modern digital tools and platforms, including search. Google recently released an AI image generation model that over-compensated for this, producing racially diverse images even for situations where such diversity would be historically inaccurate. With well-chosen prompts, this feature was made to look either ridiculous or politically dangerous (aka "woke"), and the model has been withdrawn for further refinement and testing.

I've just been reading an extended thread from Yishan Wong who argues 

The bigger problem he identifies is the inability of the engineers to anticipate and constrain the behaviour of a complex intelligent systems. As in many of Asimov's stories, where the robots often behave in dangerous ways.

Some writers on technology ethics have called for ethical principles to be embedded in technology, along the lines of Asimov's Laws. I have challenged this idea in previous posts, because as I see it the whole point of the Three Laws is that they don't work properly. Thus my reading of Asimov's stories is similar to Yishan's.

It looks like their testing didn't take context of use into account. 

Update: Or as Dame Wendy Hall noted later, This is not just safety testing, this is does-it-make-any-sense training.



Dan Milmo, Google pauses AI-generated images of people after ethnicity criticism (Guardian, 22 February 2024) 

Dan Milmo and Alex Hern, ‘We definitely messed up’: why did Google AI tool make offensive historical images? (Guardian, 8 March 2024)

Related posts: Reinforcing Stereotypes (May 2007), Purpose of Diversity (January 2010) (December 2014), Automation Ethics (August 2019), Algorithmic Bias (March 2021)