<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US"><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://www.johnsosoka.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.johnsosoka.com/" rel="alternate" type="text/html" hreflang="en-US" /><updated>2026-04-01T20:06:43-06:00</updated><id>https://www.johnsosoka.com/feed.xml</id><title type="html">johnsosoka.com</title><subtitle>Writing Code in Boise.</subtitle><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><entry><title type="html">Claude Code Thoughts, Patterns &amp;amp; Workflows</title><link href="https://www.johnsosoka.com/note/2026/01/03/claude-code-patterns.html" rel="alternate" type="text/html" title="Claude Code Thoughts, Patterns &amp;amp; Workflows" /><published>2026-01-03T00:00:00-07:00</published><updated>2026-01-03T00:00:00-07:00</updated><id>https://www.johnsosoka.com/note/2026/01/03/claude-code-patterns</id><content type="html" xml:base="https://www.johnsosoka.com/note/2026/01/03/claude-code-patterns.html"><![CDATA[<p>Initial notes on working with Claude Code.</p>

<h2 id="thoughts">Thoughts</h2>

<ul>
  <li>I need better patterns in managing my own context as well as better patterns for managing agent/subagent context.</li>
  <li>The cost of refactor is very cheap now. This can change how our iterative rounds on software products look</li>
  <li>The very begining &amp; end of these AI implemented projects require the closest human attention.</li>
</ul>

<h2 id="patterns">Patterns</h2>

<ul>
  <li>In my user-scoped <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> I inform the agents to use <code class="language-plaintext highlighter-rouge">{project_root}/llm_memory</code> for working docs, plans, etc.
    <ul>
      <li>This helps prolong working sessions. I can often skip compact when context is low and just point to relevant docs in the folder.</li>
      <li><strong>warning</strong> this does require pruning. I have a <code class="language-plaintext highlighter-rouge">/</code> command guiding the orchestration llm to prune, but the trick is remembering to invoke it :)</li>
      <li></li>
    </ul>
  </li>
</ul>

<h2 id="workflows">Workflows</h2>

<ul>
  <li>Achieve longer sessions with <code class="language-plaintext highlighter-rouge">projects</code> subdir within <code class="language-plaintext highlighter-rouge">llm_memory</code> folder</li>
</ul>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="note" /><summary type="html"><![CDATA[Initial notes on working with Claude Code.]]></summary></entry><entry><title type="html">Creating an AI Powered 3D Printer Object Design Factory</title><link href="https://www.johnsosoka.com/blog/2025/12/26/ai-3d-printer-design-factory.html" rel="alternate" type="text/html" title="Creating an AI Powered 3D Printer Object Design Factory" /><published>2025-12-26T00:00:00-07:00</published><updated>2025-12-26T00:00:00-07:00</updated><id>https://www.johnsosoka.com/blog/2025/12/26/ai-3d-printer-design-factory</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2025/12/26/ai-3d-printer-design-factory.html"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>I recently purchased my <em>first</em> 3d printer for Christmas, and soon found out I’m probably one of the last among my friends to own one. It has proven to be immediately useful, especially with all of the pre-made objects on websites like <a href="https://www.printables.com">printables.com</a> . I soon found myself needing a few custom objects where existing prints didn’t already exist.</p>

<p>Unfortunately, I have zero 3d modeling experience. I do however have expertise in software and AI Agent engineering. I decided to set out on a quest to build a multi-agent object creation team to help build 3d models for objects I needed to print. Within hours, I had some sharp looking models ready to print. Below are a few snippets–these are intended to be used as under-desk mounts for <a href="https://www.schiit.com/">Schiit</a> audio devices.</p>

<p><strong>AI Created Results (Under Desk C-Mounts)</strong></p>

<p><img src="https://media.johnsosoka.com/blog/2025-12-26/left-bracket.png" alt="schitt-single-mount-L" /></p>

<p><em>From the above image, you can see there are large holes to make installing the screw in the upper mounts easier as a tool can now pass through.</em></p>

<p><img src="https://media.johnsosoka.com/blog/2025-12-26/schiit-stacked-magnius-modius-left.png" alt="schit-stacked-mount" /></p>

<p><em>Hexagonal ventilation can be seen above.</em></p>

<h2 id="trial-and-error">Trial and Error</h2>

<p>What kicked off this project, was seeing that <a href="https://github.com/neka-nat/freecad-mcp">freecad-mcp</a> existed. I installed and configured in FreeCAD using the directions. unfortunately, the MCP responses from the tooling are so large that you can only complete about 3 or 4 turns before the context files. My first approach was to deploy a Haiku subagent “CAD Object Viewer” which was equipped with an ephemeral context and <em>read only</em> CAD MCP tooling. The orchestrator agent was tasked with using the subagent to view objects and receive descriptions to preserve the main agent context….Unfortunately, this path forward only unlocked a handful more total turns in the conversation. Context still degraded and the results yielded were poor.</p>

<h2 id="freecad-macros--3d-model-as-code">FreeCAD Macros / 3d-model-as-code.</h2>

<p>I learned that FreeCAD supported <a href="https://wiki.freecad.org/Macros">Python Macros</a> which can include python specs that provide macro instructions to create objects!!! This is tremendously helpful for a few reasons.</p>

<ul>
  <li>Python code informing object construction preserves context
    <ul>
      <li>Re-usable components</li>
      <li>Dynamically generated components (scaling vents/dimensions)</li>
      <li>version control for 3d models!</li>
    </ul>
  </li>
</ul>

<p>Now we can commit our progress and rollback to known good models, I was starting to feel much more comfortable in the realm of CAD/3d modeling. After gaining comfort using a python spec to generate objects in FreeCAD (Macro -&gt; Macros -&gt; Specify Script Location) I decided to build out my domain specialist team.</p>

<h3 id="usable-workflow">Usable Workflow</h3>

<p>In the <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> for the orchestrator, I used a pattern I’ve employed on other projects and specified a directory <code class="language-plaintext highlighter-rouge">llm_memory</code> for the agents to hand off materials to each other. The main markdown provides several workflow examples, capturing the general flow of</p>
<ul>
  <li>Mechanical Designer decomposes task, writes detailed spec</li>
  <li>CAD Spec Implementer writes the code</li>
  <li>Mechanical Designer then
    <ul>
      <li>Executes via MCP (to render object)</li>
      <li>Get View via MCP (to view object)</li>
      <li>Provide feedback or approve</li>
    </ul>
  </li>
</ul>

<p>Below is a cluttered low diagram capturing my current process.</p>

<p><img src="https://media.johnsosoka.com/blog/2025-12-26/workflow-diagram-v2.png" alt="AI 3D Modeling Workflow" /></p>

<p>The end result, is that by avoiding MCP to build the objects I can preserve context and have much long working sessions. Further, when context is preserved for each domain expert they excel at their specific tasks (The Mechanical Engineer’s context isn’t cluttered with python scripts or huge paylods descripting CAD object meshes.)</p>

<h2 id="conclusion--the-future">Conclusion &amp; The Future</h2>

<p>The workflow I’ve outlined above is functional, but far from perfect. It’s what I cobbled together over the course of a day of tinkering and I wanted to write a quick post capturing it. I’ve already printed a handful of mounts and am continuing to iterate/test materials.</p>

<p>I think that in the future I may explore the MCP integration a bit more and try to identify context preservation opportunities–like compressing the base64 encoded object views prior to sending to the LLM, or potentially using tiny models to describe images instead.</p>

<p>I’ll definitely be starting a public GitHub repo with some of the python specs for the objects I’ve been having AI create.</p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="ai" /><category term="llm" /><category term="3d-printing" /><category term="cad" /><category term="openscad" /><category term="design" /><category term="automation" /><category term="factory" /><summary type="html"><![CDATA[Introduction]]></summary></entry><entry><title type="html">Getting LLMs to Use Data You Don’t Let Them See</title><link href="https://www.johnsosoka.com/blog/2025/11/29/llms-and-hidden-data.html" rel="alternate" type="text/html" title="Getting LLMs to Use Data You Don’t Let Them See" /><published>2025-11-29T00:00:00-07:00</published><updated>2025-11-29T00:00:00-07:00</updated><id>https://www.johnsosoka.com/blog/2025/11/29/llms-and-hidden-data</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2025/11/29/llms-and-hidden-data.html"><![CDATA[<blockquote>
  <p>How can we reliably get an LLM to act on and interact with information that we’ve chosen not to expose to it?</p>
</blockquote>

<h2 id="introduction--problem">Introduction &amp; Problem</h2>

<p>LLMs are working their way into more and more business processes either as agents with scoped autonomy or as part of larger, more deterministic workflows and pipelines. These new LLM-powered agents and workflows are remarkably good at ingesting and acting upon unstructured data–but sometimes that unstructured data contains information we do not want to send to a 3rd party LLM provider, no matter how much they assure us our data is safe.</p>

<p>This creates a paradox: the kind of data LLMs excel at working with often contains the same data we can’t let them see.</p>

<h2 id="the-solution">The Solution</h2>

<p>To solve todays problem, we will implement our own custom agent middleware and leverage the <code class="language-plaintext highlighter-rouge">wrap_model_call</code> method defined by the base class. When our middleware is attached, we will have access to the inference request <em>prior</em> to being sent to a remote LLM provider. This level of access gives us the ability to:</p>

<ol>
  <li>Detect sensitive data in user messages (phone numbers, SSNs, email addresses, etc.)</li>
  <li>Swap each sensitive value for a generated placeholder and store the mapping in a registry</li>
  <li>Send only the masked messages to the LLM provider</li>
  <li>Restore the original values in the response before returning it to the user—or before calling downstream tools that need the real data</li>
</ol>

<p>Below is a diagram capturing the overall flow in a scenario where the LLM must leverage masked PII to invoke a verification tool.</p>

<p><img src="https://media.johnsosoka.com/blog/2025-11-29/flow-diagram-w-tools.png" alt="llm-masking-flow-diagram" /></p>
<h2 id="implementation">Implementation</h2>

<p>The complete code for this example is available in my <a href="https://github.com/johnsosoka/code-examples/tree/main/python/langchain-inference-masking">code-examples</a> repository. This post aims to focus on the high level strategy and new LangChain framework offerings rather than the details of writing regex to identify a social security number :)</p>

<h3 id="langchain-middleware">LangChain Middleware</h3>

<p>As part of the much anticipated 1.0 release, LangChain added <a href="https://blog.langchain.com/agent-middleware/">agent middleware</a> which provides hooks into the framework which fire:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">before_model</code> - Executes implemented method prior to inference</li>
  <li><code class="language-plaintext highlighter-rouge">after_model</code> - Executes implemented method after inference</li>
  <li><code class="language-plaintext highlighter-rouge">modify_model_request</code> - Modifies the pending inference request.</li>
</ul>

<p>In practice, these hooks let you execute custom logic at the exact boundary where data crosses from your application into the LLM provider. This solution is only one small use case for the LLM engineering opportunities opened up by these new middleware hooks.</p>

<h3 id="building-the-middleware">Building the Middleware</h3>

<p>LangChain provides an <code class="language-plaintext highlighter-rouge">AgentMiddleware</code> base class which defines the middleware methods to be invoked around model inference. We’re planning to implement <code class="language-plaintext highlighter-rouge">pii_masking</code> middleware that will leverage the <a href="https://github.com/johnsosoka/code-examples/blob/main/python/langchain-inference-masking/src/services/pii_registry.py">pii registry</a> which is a singleton key/value store accessible throughout the application. The purpose of the registry is to exchange PII for a unique ID and vice-versa.</p>

<p><strong>Note:</strong> <em>for demo simplicity, this uses an in-memory singleton registry. In production you’d want a request-scoped or externalized mapping to ensure thread/process safety.</em></p>

<p>To start, our class is implementing <code class="language-plaintext highlighter-rouge">AgentMiddleware</code> you can view the complete <a href="https://github.com/johnsosoka/code-examples/blob/main/python/langchain-inference-masking/src/middleware/pii_masking.py">custom middleware class here</a> .</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">class</span> <span class="nc">PiiMaskingMiddleware</span><span class="p">(</span><span class="n">AgentMiddleware</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    Middleware that masks PII in messages before LLM inference,
    then restores original values in responses.

    Supported PII types:
        - Email addresses
        - Phone numbers (US format)
        - Social Security Numbers (SSN)

    The middleware maintains an in-memory registry mapping placeholders
    to original values, enabling round-trip masking/unmasking.

    Example usage:
        from langchain.agents import create_agent

        middleware = PiiMaskingMiddleware()
        agent = create_agent(
            model=</span><span class="sh">"</span><span class="s">gpt-4o</span><span class="sh">"</span><span class="s">,
            tools=[],
            middleware=[middleware]
        )
    </span><span class="sh">"""</span>
</code></pre></div></div>

<p>The next interesting piece in our middleware class is where we implement <code class="language-plaintext highlighter-rouge">wrap_model_call</code>. In this method our logic iterates through every message in the original request, building a new list of messages that have masked PII information:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">def</span> <span class="nf">wrap_model_call</span><span class="p">(</span>
        <span class="n">self</span><span class="p">,</span>
        <span class="n">request</span><span class="p">:</span> <span class="n">ModelRequest</span><span class="p">,</span>
        <span class="n">handler</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="n">ModelRequest</span><span class="p">],</span> <span class="n">ModelResponse</span><span class="p">],</span>
    <span class="p">)</span> <span class="o">-&gt;</span> <span class="n">ModelResponse</span><span class="p">:</span>
        <span class="sh">"""</span><span class="s">
        Wrap the model call to mask PII before and unmask after.

        This hook intercepts the actual model request, masks PII in messages,
        calls the model, then restores PII in the response.

        Args:
            request: The model request containing messages to be sent
            handler: The function to call the model

        Returns:
            ModelResponse with PII restored in the output
        </span><span class="sh">"""</span>
        <span class="c1"># Mask PII in all messages before sending to LLM
</span>        <span class="n">masked_messages</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">AnyMessage</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="k">for</span> <span class="n">msg</span> <span class="ow">in</span> <span class="n">request</span><span class="p">.</span><span class="n">messages</span><span class="p">:</span>
            <span class="n">masked_messages</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="nf">_mask_message</span><span class="p">(</span><span class="n">msg</span><span class="p">))</span>

        <span class="n">pii_count</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">_registry</span><span class="p">.</span><span class="n">registry</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">pii_count</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
            <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Masked </span><span class="si">{</span><span class="n">pii_count</span><span class="si">}</span><span class="s"> PII value(s) before model call</span><span class="sh">"</span><span class="p">)</span>

        <span class="c1"># Create new request with masked messages
</span>        <span class="n">masked_request</span> <span class="o">=</span> <span class="n">request</span><span class="p">.</span><span class="nf">override</span><span class="p">(</span><span class="n">messages</span><span class="o">=</span><span class="n">masked_messages</span><span class="p">)</span>

        <span class="c1"># Call the model with masked messages
</span>        <span class="n">response</span> <span class="o">=</span> <span class="nf">handler</span><span class="p">(</span><span class="n">masked_request</span><span class="p">)</span>
</code></pre></div></div>

<p>The code which masks the fields (if identified) also registers the original sensitive data with our PII Registry. When the model returns a result, similar logic executes to return the original sensitive data back to the response.</p>

<p><strong>Note:</strong> This is <em>after</em> the LLM has been invoked, so the 3rd party has never been exposed to the sensitive information at this point.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        <span class="c1"># Call the model with masked messages
</span>        <span class="n">response</span> <span class="o">=</span> <span class="nf">handler</span><span class="p">(</span><span class="n">masked_request</span><span class="p">)</span>

        <span class="c1"># Unmask PII in the response messages
</span>        <span class="k">if</span> <span class="n">response</span><span class="p">.</span><span class="n">result</span><span class="p">:</span>
            <span class="n">unmasked_results</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">BaseMessage</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
            <span class="n">modified</span> <span class="o">=</span> <span class="bp">False</span>

            <span class="k">for</span> <span class="n">msg</span> <span class="ow">in</span> <span class="n">response</span><span class="p">.</span><span class="n">result</span><span class="p">:</span>
                <span class="k">if</span> <span class="nf">isinstance</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="n">AIMessage</span><span class="p">)</span> <span class="ow">and</span> <span class="nf">isinstance</span><span class="p">(</span><span class="n">msg</span><span class="p">.</span><span class="n">content</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
                    <span class="n">unmasked_content</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="nf">_unmask_pii_in_text</span><span class="p">(</span><span class="n">msg</span><span class="p">.</span><span class="n">content</span><span class="p">)</span>
                    <span class="k">if</span> <span class="n">unmasked_content</span> <span class="o">!=</span> <span class="n">msg</span><span class="p">.</span><span class="n">content</span><span class="p">:</span>
                        <span class="n">unmasked_results</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="nc">AIMessage</span><span class="p">(</span><span class="n">content</span><span class="o">=</span><span class="n">unmasked_content</span><span class="p">))</span>
                        <span class="n">modified</span> <span class="o">=</span> <span class="bp">True</span>
                    <span class="k">else</span><span class="p">:</span>
                        <span class="n">unmasked_results</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>
                <span class="k">else</span><span class="p">:</span>
                    <span class="n">unmasked_results</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>

            <span class="k">if</span> <span class="n">modified</span><span class="p">:</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Restored PII in model response</span><span class="sh">"</span><span class="p">)</span>
                <span class="k">return</span> <span class="nc">ModelResponse</span><span class="p">(</span>
                    <span class="n">result</span><span class="o">=</span><span class="n">unmasked_results</span><span class="p">,</span>
                    <span class="n">structured_response</span><span class="o">=</span><span class="n">response</span><span class="p">.</span><span class="n">structured_response</span><span class="p">,</span>
                <span class="p">)</span>

        <span class="k">return</span> <span class="n">response</span>
</code></pre></div></div>

<p><em>Again, the complete middleware code along with the PII Registry are available on <a href="https://github.com/johnsosoka/code-examples/tree/main/python/langchain-inference-masking">GitHub</a> I’m using truncated examples for brevity in this post</em></p>

<p>From the above snippets, you should now see how we can attach custom code via <code class="language-plaintext highlighter-rouge">AgentMiddleware</code> to modify requests right before inference time. The middleware pattern is tremendously useful.</p>

<h3 id="testing--observability">Testing &amp; Observability</h3>

<p>I’ve set up two tests. One only tests that sensitive data is removed from the users message prior to inference &amp; that after inference returns the original PII. First, let’s explore how the simpler test is constructed &amp; functions when executed.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">run_simple_demo</span><span class="p">():</span>  
    <span class="sh">"""</span><span class="s">  
    Run an agent with PII masking middleware using LangChain 1.0 create_agent.  
    The middleware automatically intercepts messages before sending to the LLM,    masks PII, then restores it in the response. LangSmith will show    the masked values in the trace.    </span><span class="sh">"""</span>    <span class="n">config</span> <span class="o">=</span> <span class="nc">ConfigLoader</span><span class="p">()</span>  
  
    <span class="n">model</span> <span class="o">=</span> <span class="nc">ChatOpenAI</span><span class="p">(</span>  
        <span class="n">model</span><span class="o">=</span><span class="n">config</span><span class="p">.</span><span class="n">model_name</span><span class="p">,</span>  
        <span class="n">api_key</span><span class="o">=</span><span class="n">config</span><span class="p">.</span><span class="n">openai_api_key</span><span class="p">,</span>  
    <span class="p">)</span>  
  
    <span class="c1"># Create middleware instance  
</span>    <span class="n">middleware</span> <span class="o">=</span> <span class="nc">PiiMaskingMiddleware</span><span class="p">()</span>  
  
    <span class="c1"># System prompt for the demo  
</span>    <span class="n">system_prompt</span> <span class="o">=</span> <span class="p">(</span>  
        <span class="sh">"</span><span class="s">You are a helpful assistant participating in a PII masking middleware test. </span><span class="sh">"</span>  
        <span class="sh">"</span><span class="s">When the user asks you to repeat information back, please do so exactly as provided. </span><span class="sh">"</span>        <span class="sh">"</span><span class="s">This is a controlled test environment.</span><span class="sh">"</span>    <span class="p">)</span>  
  
    <span class="c1"># Create agent with middleware - LangChain 1.0 API  
</span>    <span class="n">agent</span> <span class="o">=</span> <span class="nf">create_agent</span><span class="p">(</span>  
        <span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>  
        <span class="n">tools</span><span class="o">=</span><span class="p">[],</span>  <span class="c1"># No tools needed for this demo  
</span>        <span class="n">middleware</span><span class="o">=</span><span class="p">[</span><span class="n">middleware</span><span class="p">],</span>  
        <span class="n">system_prompt</span><span class="o">=</span><span class="n">system_prompt</span><span class="p">,</span>  
    <span class="p">)</span>  
  
    <span class="c1"># Example message with PII  
</span>    <span class="n">user_message</span> <span class="o">=</span> <span class="p">(</span>  
        <span class="sh">"</span><span class="s">For this middleware test, my phone number is 555-867-5309. </span><span class="sh">"</span>  
        <span class="sh">"</span><span class="s">Please repeat my phone number back to me exactly as I wrote it.</span><span class="sh">"</span>    <span class="p">)</span>  
  
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="si">{</span><span class="sh">'</span><span class="s">=</span><span class="sh">'</span><span class="o">*</span><span class="mi">60</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>  
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Running agent with PII masking middleware (LangChain 1.0)</span><span class="sh">"</span><span class="p">)</span>  
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Original user message: </span><span class="si">{</span><span class="n">user_message</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>  
  
    <span class="c1"># Invoke the agent - middleware is applied automatically  
</span>    <span class="n">result</span> <span class="o">=</span> <span class="n">agent</span><span class="p">.</span><span class="nf">invoke</span><span class="p">({</span><span class="sh">"</span><span class="s">messages</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="nc">HumanMessage</span><span class="p">(</span><span class="n">content</span><span class="o">=</span><span class="n">user_message</span><span class="p">)]})</span>  
  
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Agent response: </span><span class="si">{</span><span class="n">result</span><span class="p">[</span><span class="sh">'</span><span class="s">messages</span><span class="sh">'</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">content</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>  
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="s">PII Registry: </span><span class="si">{</span><span class="n">middleware</span><span class="p">.</span><span class="n">_mask_registry</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>You can see in the above we prepare the test situation, and pass a user message that includes a phone number (which will trigger our Middleware). Also note that we equip the custom middleware when we construct the agent graph via:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
	<span class="n">middleware</span> <span class="o">=</span> <span class="nc">PiiMaskingMiddleware</span><span class="p">()</span>  
	<span class="bp">...</span>
    <span class="n">agent</span> <span class="o">=</span> <span class="nf">create_agent</span><span class="p">(</span>
        <span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>  
        <span class="n">tools</span><span class="o">=</span><span class="p">[],</span>  <span class="c1"># No tools needed for this demo  
</span>        <span class="n">middleware</span><span class="o">=</span><span class="p">[</span><span class="n">middleware</span><span class="p">],</span>  
        <span class="n">system_prompt</span><span class="o">=</span><span class="n">system_prompt</span><span class="p">,</span>  
    <span class="p">)</span>
</code></pre></div></div>

<h4 id="validating-test-1-langsmith-observability">Validating Test #1 (LangSmith Observability)</h4>

<p><a href="https://smith.langchain.com">LangSmith</a> is an observability and evals platform–free to use for small personal projects. I leverage it for many of my weekend projects &amp; POCs. By configuring an api key and setting the following environment variables:</p>

<pre><code class="language-env">LANGSMITH_API_KEY=your-langsmith-api-key  
LANGSMITH_TRACING=true  
LANGSMITH_PROJECT=langchain-inference-masking
</code></pre>

<p>LangChain/LangGraph telemetry data will be sent to your LangSmith account. When I run the test situation from above, I can see the following in LangSmith:</p>

<p><img src="https://media.johnsosoka.com/blog/2025-11-29/simple-trace-e2e.png" alt="simple-trace" /></p>

<p>This doesn’t look very interesting, right? The high level inputs and outputs look entirely unremarkable, with the user providing their phone number &amp; the LLM repeating it back.</p>

<h5 id="diving-deeper-into-the-trace">Diving Deeper into the Trace</h5>

<p>If we drill into the actual LLM inference node and observe the inputs and outputs we can see the following:</p>

<p><img src="https://media.johnsosoka.com/blog/2025-11-29/simple-trace-llm-masked.png" alt="simple-trace-llm-masked" /></p>

<p>There it is! This specific node in the trace is when OpenAI is invoked. Look carefully at the inputs and outputs…</p>

<p><code class="language-plaintext highlighter-rouge">[PHONE:59c0b4a6]</code></p>

<p>We can see the middleware working! The request that was actually sent over the wire to our 3rd party LLM Provider (OpenAI) had all sensitive information substituted for a unique ID prior to inference &amp; then returned post-inference.</p>

<h4 id="validating-test-2">Validating Test #2</h4>

<p>Test #2 can be found on in its entirety <a href="https://github.com/johnsosoka/code-examples/blob/main/python/langchain-inference-masking/src/agents/verification_agent.py">here on GitHub</a> . In the second test, we add an additional “verification service” which requires the unmasked data to verify user information. This example is a bit more sophisticated, because now an LLM is actually <em>acting upon masked data</em> by using the exchanged ID to invoke a tool.</p>

<p>In this test, the tool exchanges the ID for the real sensitive information and then queries our dummy “verification service.” Again, even in the more complicated example, when we view the overall trace it looks totally unremarkable:</p>

<p><img src="https://media.johnsosoka.com/blog/2025-11-29/complex-trace-e2e.png" alt="complex-trace-e2e" /></p>

<p>In the above trace, everything looks as expected. The user provides “sensitive” data, the LLM invokes a tool using that sensitive data, and then returns a message confirming that the phone number is verified.</p>

<p>If we look at the actual inference requests, we can see the middleware working:</p>

<p><img src="https://media.johnsosoka.com/blog/2025-11-29/complex-trace-llm-masked.png" alt="complex-trace-llm-masked" /></p>

<p>When we view the inputs/outputs <em>actually sent</em> to OpenAI, the phone number utilized by the LLM to check the verification status is indeed masked as <code class="language-plaintext highlighter-rouge">[PHONE:59c0b4a6]</code></p>

<h2 id="conclusion">Conclusion</h2>

<p>Some patterns are simply timeless! The middleware pattern in LangChain has SO MANY applications, far beyond masking sensitive data from 3rd party LLM providers. It could be leveraged for human in the loop, blocking pre-inference requests until a human-issued command releases a lock. Middleware could be leveraged to enrich agent context by adding to the messages array, or even to load a “side-car” LLM instance tasked with evaluating human input for malicious prompts prior to forwarding the request to the target model within the LLM Workflow.</p>

<p>Hopefully awareness of the new <code class="language-plaintext highlighter-rouge">AgentMiddleware</code> capability and some of the hands on examples in the blogpost will prove useful in your future projects.</p>

<p>Happy coding!</p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="LangChain" /><category term="Agent-Middleware" /><category term="data-masking" /><category term="privacy" /><category term="security" /><category term="python" /><category term="data-masking" /><category term="llm" /><category term="openai" /><category term="AgentMiddleware" /><category term="Custom" /><category term="Middleware" /><summary type="html"><![CDATA[How can we reliably get an LLM to act on and interact with information that we’ve chosen not to expose to it?]]></summary></entry><entry><title type="html">Weekend Project: Dynamic DNS with AWS Lambda &amp;amp; Jenkins</title><link href="https://www.johnsosoka.com/blog/2025/03/02/diy-dynamic-dns.html" rel="alternate" type="text/html" title="Weekend Project: Dynamic DNS with AWS Lambda &amp;amp; Jenkins" /><published>2025-03-02T00:00:00-07:00</published><updated>2025-03-02T00:00:00-07:00</updated><id>https://www.johnsosoka.com/blog/2025/03/02/diy-dynamic-dns</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2025/03/02/diy-dynamic-dns.html"><![CDATA[<p>A few months back, in December, I purchased a Unifi Ubiquity Dream Machine (UDM). This was a significant upgrade from my
previous router, and enabled me to segment my home network into multiple VLANs. I typically do not like to expose services 
from my home network, but with a completely segmented network, I felt comfortable exposing a few services. The first service 
I wanted to expose publicly was a vanilla Java Minecraft server. I already had a domain name (johnsosoka.com) and wanted to
set up dynamic DNS for minecraft.johnsosoka.com.</p>

<h2 id="the-game-plan">The Game Plan</h2>

<p>My personal website is hosted on AWS, and I already have a bit of infrastructure in place, including API Gateway and a
few Lambda functions. Furthermore, I have a server running Jenkins in my home network. Here’s a diagram for the project 
planned for today:</p>

<p><img src="https://media.johnsosoka.com/blog/2025-03-02/jscom-dyn-dns.png" alt="dynamic-dns-diagram" /></p>

<p><em>The Diagram above demonstrates the flow of the dynamic DNS service, the home network diagram is simplified</em></p>

<p>The plan is to create two Lambda functions, one to check/return the IP address of the caller, and another to update the
DNS record in Route53. The Jenkins server will have a job that runs periodically to fetch the IP address of the home network,
and if it has changed, it will call the update DNS Lambda function.</p>

<h2 id="the-lambda-functions">The Lambda Functions</h2>

<p>First, we’ll create the lambda function to return the IP address of the caller. This function will be a simple Python
script that returns the IP address of the caller. Here’s the code:</p>

<p><strong>Check IP Lambda Function:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">json</span>


<span class="k">def</span> <span class="nf">lambda_handler</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    AWS Lambda function to return the requesting client</span><span class="sh">'</span><span class="s">s IP address.

    This function serves as a lightweight service similar to whatismyip.com,
    retrieving the client</span><span class="sh">'</span><span class="s">s IP from the API Gateway request context. It is
    designed for invocation via API Gateway with Lambda Proxy Integration.

    Args:
        event (dict): Contains the request details including the client</span><span class="sh">'</span><span class="s">s IP.
        context (LambdaContext): Provides runtime information.

    Returns:
        dict: An HTTP response with a JSON body containing the client</span><span class="sh">'</span><span class="s">s IP address.
    </span><span class="sh">"""</span>
    <span class="n">ip_address</span> <span class="o">=</span> <span class="n">event</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">requestContext</span><span class="sh">"</span><span class="p">,</span> <span class="p">{}).</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">http</span><span class="sh">"</span><span class="p">,</span> <span class="p">{}).</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">sourceIp</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">IP not found</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">return</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">statusCode</span><span class="sh">"</span><span class="p">:</span> <span class="mi">200</span><span class="p">,</span>
        <span class="sh">"</span><span class="s">headers</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">Content-Type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">application/json</span><span class="sh">"</span><span class="p">},</span>
        <span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">:</span> <span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">({</span><span class="sh">"</span><span class="s">ip</span><span class="sh">"</span><span class="p">:</span> <span class="n">ip_address</span><span class="p">})</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>The above lambda function is incredibly simple, and it will be invoked via API Gateway. I’ll be skipping over the Terraform
code, but it’s available in the GitHub repository for this blog post.</p>

<p>Next up is to create the lambda function for updating the DNS record in Route53. Since this is performing a write operation,
I’ll be securing the function with an API key. Here’s the code for the update DNS Lambda function:</p>

<p><strong>Update DNS Lambda Function:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">os</span>
<span class="kn">import</span> <span class="n">json</span>
<span class="kn">import</span> <span class="n">boto3</span>

<span class="k">def</span> <span class="nf">lambda_handler</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    Update a DNS A record in Route53.

    This function supports dynamic DNS updates (for example, updating a Minecraft server</span><span class="sh">'</span><span class="s">s external IP).
    It expects a JSON payload with:
      - domain: the DNS record name (e.g. </span><span class="sh">"</span><span class="s">minecraft.example.com.</span><span class="sh">"</span><span class="s">)
      - ip: the new A record value (e.g. </span><span class="sh">"</span><span class="s">1.2.3.4</span><span class="sh">"</span><span class="s">)

    The authorization token is expected in the request headers (key </span><span class="sh">"</span><span class="s">x-auth-token</span><span class="sh">"</span><span class="s">). The token is verified against
    the AUTH_TOKEN environment variable.

    Returns:
        dict: HTTP response containing a status message.
    </span><span class="sh">"""</span>
    <span class="n">expected_token</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">AUTH_TOKEN</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">hosted_zone_id</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">HOSTED_ZONE_ID</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Retrieve auth token from headers
</span>    <span class="n">headers</span> <span class="o">=</span> <span class="n">event</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">headers</span><span class="sh">"</span><span class="p">,</span> <span class="p">{})</span>
    <span class="n">auth_token</span> <span class="o">=</span> <span class="n">headers</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">x-auth-token</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">auth_token</span> <span class="o">!=</span> <span class="n">expected_token</span><span class="p">:</span>
        <span class="k">return</span> <span class="p">{</span>
            <span class="sh">"</span><span class="s">statusCode</span><span class="sh">"</span><span class="p">:</span> <span class="mi">403</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">:</span> <span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">({</span><span class="sh">"</span><span class="s">error</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Unauthorized</span><span class="sh">"</span><span class="p">})</span>
        <span class="p">}</span>

    <span class="n">body</span> <span class="o">=</span> <span class="n">event</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">body</span><span class="p">:</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="nf">loads</span><span class="p">(</span><span class="n">body</span><span class="p">)</span>
        <span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
            <span class="k">return</span> <span class="p">{</span>
                <span class="sh">"</span><span class="s">statusCode</span><span class="sh">"</span><span class="p">:</span> <span class="mi">400</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">:</span> <span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">({</span><span class="sh">"</span><span class="s">error</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Invalid JSON payload</span><span class="sh">"</span><span class="p">})</span>
            <span class="p">}</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">data</span> <span class="o">=</span> <span class="p">{}</span>

    <span class="n">domain</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">domain</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">new_ip</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">ip</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">domain</span> <span class="ow">or</span> <span class="ow">not</span> <span class="n">new_ip</span><span class="p">:</span>
        <span class="k">return</span> <span class="p">{</span>
            <span class="sh">"</span><span class="s">statusCode</span><span class="sh">"</span><span class="p">:</span> <span class="mi">400</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">:</span> <span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">({</span><span class="sh">"</span><span class="s">error</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Missing </span><span class="sh">'</span><span class="s">domain</span><span class="sh">'</span><span class="s"> or </span><span class="sh">'</span><span class="s">ip</span><span class="sh">'</span><span class="s"> parameter</span><span class="sh">"</span><span class="p">})</span>
        <span class="p">}</span>

    <span class="k">if</span> <span class="ow">not</span> <span class="n">hosted_zone_id</span><span class="p">:</span>
        <span class="k">return</span> <span class="p">{</span>
            <span class="sh">"</span><span class="s">statusCode</span><span class="sh">"</span><span class="p">:</span> <span class="mi">500</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">:</span> <span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">({</span><span class="sh">"</span><span class="s">error</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Hosted zone ID not configured</span><span class="sh">"</span><span class="p">})</span>
        <span class="p">}</span>

    <span class="n">route53</span> <span class="o">=</span> <span class="n">boto3</span><span class="p">.</span><span class="nf">client</span><span class="p">(</span><span class="sh">"</span><span class="s">route53</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="n">response</span> <span class="o">=</span> <span class="n">route53</span><span class="p">.</span><span class="nf">change_resource_record_sets</span><span class="p">(</span>
            <span class="n">HostedZoneId</span><span class="o">=</span><span class="n">hosted_zone_id</span><span class="p">,</span>
            <span class="n">ChangeBatch</span><span class="o">=</span><span class="p">{</span>
                <span class="sh">"</span><span class="s">Comment</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Auto-updated by update_dns_lambda</span><span class="sh">"</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">Changes</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span>
                    <span class="p">{</span>
                        <span class="sh">"</span><span class="s">Action</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">UPSERT</span><span class="sh">"</span><span class="p">,</span>
                        <span class="sh">"</span><span class="s">ResourceRecordSet</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
                            <span class="sh">"</span><span class="s">Name</span><span class="sh">"</span><span class="p">:</span> <span class="n">domain</span><span class="p">,</span>
                            <span class="sh">"</span><span class="s">Type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">A</span><span class="sh">"</span><span class="p">,</span>
                            <span class="sh">"</span><span class="s">TTL</span><span class="sh">"</span><span class="p">:</span> <span class="mi">300</span><span class="p">,</span>
                            <span class="sh">"</span><span class="s">ResourceRecords</span><span class="sh">"</span><span class="p">:</span> <span class="p">[{</span><span class="sh">"</span><span class="s">Value</span><span class="sh">"</span><span class="p">:</span> <span class="n">new_ip</span><span class="p">}]</span>
                        <span class="p">}</span>
                    <span class="p">}</span>
                <span class="p">]</span>
            <span class="p">}</span>
        <span class="p">)</span>
    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="k">return</span> <span class="p">{</span>
            <span class="sh">"</span><span class="s">statusCode</span><span class="sh">"</span><span class="p">:</span> <span class="mi">500</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">:</span> <span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">({</span>
                <span class="sh">"</span><span class="s">error</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Failed to update DNS record</span><span class="sh">"</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">message</span><span class="sh">"</span><span class="p">:</span> <span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
            <span class="p">},</span> <span class="n">default</span><span class="o">=</span><span class="nb">str</span><span class="p">)</span>
        <span class="p">}</span>

    <span class="k">return</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">statusCode</span><span class="sh">"</span><span class="p">:</span> <span class="mi">200</span><span class="p">,</span>
        <span class="sh">"</span><span class="s">body</span><span class="sh">"</span><span class="p">:</span> <span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">({</span>
            <span class="sh">"</span><span class="s">message</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">DNS record updated</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">change_info</span><span class="sh">"</span><span class="p">:</span> <span class="n">response</span>
        <span class="p">},</span> <span class="n">default</span><span class="o">=</span><span class="nb">str</span><span class="p">)</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>There is a bit more going on with the above Lambda function, but it should still be relatively straightforward. The function
will accept a JSON payload with the domain and IP address to update. The function will then update the DNS record in Route53.</p>

<p>I’ve secured this with a simple API key which is expected in the <code class="language-plaintext highlighter-rouge">x-auth-token</code> header and is verified against an environment
variable. In the future, I may revisit this and use a more secure method of authentication. I’ve also considered limiting the
DNS record that can be updated to a specific subdomain, but for now, I’m keeping it simple and allowing any record in the
hosted zone. In the future, there may be other DNS records for self-hosted services that I want to update dynamically.</p>

<h2 id="jenkins-jobs">Jenkins Jobs</h2>

<p>My Jenkins server is running in a Docker container on my home network. It already has a job for posting notifications to 
my family’s Discord server–I won’t be covering that job in this post, but it is referenced in the jobs we’ll be building 
today.</p>

<p>We’ll be creating two Jenkins jobs. One to check the current public IP address of the home network and another to update. 
Logically, we’ll build the jobs “backwards” as the Update job is called last. We’ll implement this first, so that we can
reference it when checking the current IP address &amp; name record.</p>

<p>The job is parameterized to accept a <code class="language-plaintext highlighter-rouge">DNS_DOMAIN</code> and <code class="language-plaintext highlighter-rouge">DNS_IP</code> parameter, for updating the DNS record. A secret, <code class="language-plaintext highlighter-rouge">DNS_AUTH_TOKEN</code>,
has been configured in the Jenkins credentials manager. Here’s the code for the Update DNS job:</p>

<p><strong>Update DNS Job:</strong></p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pipeline</span> <span class="o">{</span>
    <span class="n">agent</span> <span class="n">any</span>

    <span class="n">parameters</span> <span class="o">{</span>
        <span class="n">string</span><span class="o">(</span><span class="nl">name:</span> <span class="s1">'DNS_DOMAIN'</span><span class="o">,</span> <span class="nl">defaultValue:</span> <span class="s1">'minecraft.johnsosoka.com'</span><span class="o">,</span> <span class="nl">description:</span> <span class="s1">'The DNS record to update'</span><span class="o">)</span>
        <span class="n">string</span><span class="o">(</span><span class="nl">name:</span> <span class="s1">'DNS_IP'</span><span class="o">,</span> <span class="nl">defaultValue:</span> <span class="s1">'AUTO'</span><span class="o">,</span> <span class="nl">description:</span> <span class="s1">'IP to set (AUTO uses current public IP)'</span><span class="o">)</span>
    <span class="o">}</span>

    <span class="n">stages</span> <span class="o">{</span>
        <span class="n">stage</span><span class="o">(</span><span class="s1">'Install Dependencies'</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">steps</span> <span class="o">{</span>
                <span class="n">script</span> <span class="o">{</span>
                    <span class="n">sh</span> <span class="s1">'''
                        if ! command -v jq &amp;&gt; /dev/null; then
                            echo "🔧 Installing jq..."
                            apt-get update &amp;&amp; apt-get install -y jq
                        fi
                    '''</span>
                <span class="o">}</span>
            <span class="o">}</span>
        <span class="o">}</span>

        <span class="n">stage</span><span class="o">(</span><span class="s1">'Update DNS'</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">steps</span> <span class="o">{</span>
                <span class="n">withCredentials</span><span class="o">([</span><span class="n">string</span><span class="o">(</span><span class="nl">credentialsId:</span> <span class="s1">'DNS_AUTH_TOKEN'</span><span class="o">,</span> <span class="nl">variable:</span> <span class="s1">'AUTH_TOKEN'</span><span class="o">)])</span> <span class="o">{</span>
                    <span class="n">script</span> <span class="o">{</span>
                        <span class="c1">// Ensure AUTH_TOKEN is passed safely</span>
                        <span class="n">env</span><span class="o">.</span><span class="na">AUTH_TOKEN</span> <span class="o">=</span> <span class="n">AUTH_TOKEN</span>

                        <span class="kt">def</span> <span class="n">response</span> <span class="o">=</span> <span class="n">sh</span><span class="o">(</span><span class="nl">script:</span> <span class="s1">'''
                            JSON_PAYLOAD=$(printf '{
                                "domain": "%s",
                                "ip": "%s"
                            }' "$DNS_DOMAIN" "$DNS_IP")

                            curl -s -X POST "https://api.sosoka.io/v1/utility/dns" \
                            -H "Content-Type: application/json" \
                            -H "x-auth-token: $AUTH_TOKEN" \
                            -d "$JSON_PAYLOAD"
                        '''</span><span class="o">,</span> <span class="nl">returnStdout:</span> <span class="kc">true</span><span class="o">).</span><span class="na">trim</span><span class="o">()</span>

                        <span class="kt">def</span> <span class="n">httpStatus</span> <span class="o">=</span> <span class="n">sh</span><span class="o">(</span><span class="nl">script:</span> <span class="s2">"echo '${response}' | jq -r '.change_info.ResponseMetadata.HTTPStatusCode'"</span><span class="o">,</span> <span class="nl">returnStdout:</span> <span class="kc">true</span><span class="o">).</span><span class="na">trim</span><span class="o">()</span>
                        <span class="kt">def</span> <span class="n">changeStatus</span> <span class="o">=</span> <span class="n">sh</span><span class="o">(</span><span class="nl">script:</span> <span class="s2">"echo '${response}' | jq -r '.change_info.ChangeInfo.Status'"</span><span class="o">,</span> <span class="nl">returnStdout:</span> <span class="kc">true</span><span class="o">).</span><span class="na">trim</span><span class="o">()</span>

                        <span class="k">if</span> <span class="o">(</span><span class="n">httpStatus</span> <span class="o">==</span> <span class="s2">"200"</span> <span class="o">&amp;&amp;</span> <span class="o">(</span><span class="n">changeStatus</span> <span class="o">==</span> <span class="s2">"PENDING"</span> <span class="o">||</span> <span class="n">changeStatus</span> <span class="o">==</span> <span class="s2">"INSYNC"</span><span class="o">))</span> <span class="o">{</span>
                            <span class="n">echo</span> <span class="s2">"✅ DNS updated successfully! Status: ${changeStatus}"</span>
                            <span class="n">currentBuild</span><span class="o">.</span><span class="na">description</span> <span class="o">=</span> <span class="s2">"DNS updated: ${changeStatus}"</span>
                            <span class="n">notifyDiscord</span><span class="o">(</span><span class="s2">"✅ DNS updated for ${DNS_DOMAIN} to ${DNS_IP}. Status: ${changeStatus}"</span><span class="o">)</span>
                        <span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
                            <span class="n">error</span> <span class="s2">"❌ DNS update failed: ${response}"</span>
                        <span class="o">}</span>
                    <span class="o">}</span>
                <span class="o">}</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>

<span class="kt">def</span> <span class="nf">notifyDiscord</span><span class="o">(</span><span class="n">message</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">build</span> <span class="nl">job:</span> <span class="s1">'notify-discord'</span><span class="o">,</span> <span class="nl">parameters:</span> <span class="o">[</span>
            <span class="n">string</span><span class="o">(</span><span class="nl">name:</span> <span class="s1">'DISCORD_MESSAGE'</span><span class="o">,</span> <span class="nl">value:</span> <span class="n">message</span><span class="o">)</span>
    <span class="o">]</span>
<span class="o">}</span>
</code></pre></div></div>

<p><em>Note the <code class="language-plaintext highlighter-rouge">notifyDiscord</code> function at the end of the script. This is a common job that is used to post messages to Discord.</em></p>

<p>Next up is to create the Check IP Job. This job will check the current public IP address of the home network and compare it
to the existing minecraft.johnsosoka.com DNS record. If the IP address has changed, the job will trigger the Update DNS job.</p>

<p><strong>Check IP Job:</strong></p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pipeline</span> <span class="o">{</span>
    <span class="n">agent</span> <span class="n">any</span>

    <span class="n">environment</span> <span class="o">{</span>
        <span class="n">DNS_DOMAIN</span> <span class="o">=</span> <span class="s1">'minecraft.johnsosoka.com'</span>
        <span class="n">PUBLIC_IP_API</span> <span class="o">=</span> <span class="s1">'https://api.sosoka.io/v1/utility/ip'</span>
    <span class="o">}</span>

    <span class="n">stages</span> <span class="o">{</span>
        <span class="n">stage</span><span class="o">(</span><span class="s1">'Install Dependencies'</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">steps</span> <span class="o">{</span>
                <span class="n">script</span> <span class="o">{</span>
                    <span class="n">sh</span> <span class="s1">'''
                        if ! command -v jq &amp;&gt; /dev/null || ! command -v dig &amp;&gt; /dev/null; then
                            echo "🔧 Installing dependencies..."
                            apt-get update &amp;&amp; apt-get install -y jq dnsutils
                        fi
                    '''</span>
                <span class="o">}</span>
            <span class="o">}</span>
        <span class="o">}</span>

        <span class="n">stage</span><span class="o">(</span><span class="s1">'Check Current DNS'</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">steps</span> <span class="o">{</span>
                <span class="n">script</span> <span class="o">{</span>
                    <span class="n">echo</span> <span class="s2">"🔹 Checking DNS record for ${DNS_DOMAIN}..."</span>

                    <span class="c1">// Get the current IP from DNS</span>
                    <span class="kt">def</span> <span class="n">dnsIp</span> <span class="o">=</span> <span class="n">sh</span><span class="o">(</span><span class="nl">script:</span> <span class="s2">"dig +short ${DNS_DOMAIN} | head -n 1"</span><span class="o">,</span> <span class="nl">returnStdout:</span> <span class="kc">true</span><span class="o">).</span><span class="na">trim</span><span class="o">()</span>

                    <span class="c1">// Get the public IP from API</span>
                    <span class="kt">def</span> <span class="n">publicIp</span> <span class="o">=</span> <span class="n">sh</span><span class="o">(</span><span class="nl">script:</span> <span class="s2">"curl -s ${PUBLIC_IP_API} | jq -r '.ip'"</span><span class="o">,</span> <span class="nl">returnStdout:</span> <span class="kc">true</span><span class="o">).</span><span class="na">trim</span><span class="o">()</span>

                    <span class="c1">// Output results</span>
                    <span class="n">echo</span> <span class="s2">"🔹 Current DNS IP: ${dnsIp}"</span>
                    <span class="n">echo</span> <span class="s2">"🔹 Public IP from API: ${publicIp}"</span>

                    <span class="c1">// Check if DNS is outdated</span>
                    <span class="k">if</span> <span class="o">(</span><span class="n">dnsIp</span> <span class="o">==</span> <span class="n">publicIp</span><span class="o">)</span> <span class="o">{</span>
                        <span class="n">echo</span> <span class="s2">"✅ The DNS record is up to date. No action needed."</span>
                    <span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
                        <span class="n">echo</span> <span class="s2">"⚠️ DNS record is outdated. Updating to ${publicIp}..."</span>

                        <span class="c1">// Trigger the update-jscom-dns job</span>
                        <span class="n">build</span> <span class="nl">job:</span> <span class="s1">'update-jscom-dns'</span><span class="o">,</span> <span class="nl">parameters:</span> <span class="o">[</span>
                            <span class="n">string</span><span class="o">(</span><span class="nl">name:</span> <span class="s1">'DNS_DOMAIN'</span><span class="o">,</span> <span class="nl">value:</span> <span class="n">DNS_DOMAIN</span><span class="o">),</span>
                            <span class="n">string</span><span class="o">(</span><span class="nl">name:</span> <span class="s1">'DNS_IP'</span><span class="o">,</span> <span class="nl">value:</span> <span class="n">publicIp</span><span class="o">)</span>
                        <span class="o">]</span>

                        <span class="c1">// Notify Discord about the update</span>
                        <span class="n">notifyDiscord</span><span class="o">(</span><span class="s2">"⚠️ DNS Record Change Detected: ${DNS_DOMAIN} being routed to ${publicIp}"</span><span class="o">)</span>
                    <span class="o">}</span>
                <span class="o">}</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>

<span class="c1">// Function to notify Discord</span>
<span class="kt">def</span> <span class="nf">notifyDiscord</span><span class="o">(</span><span class="n">message</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">build</span> <span class="nl">job:</span> <span class="s1">'notify-discord'</span><span class="o">,</span> <span class="nl">parameters:</span> <span class="o">[</span>
        <span class="n">string</span><span class="o">(</span><span class="nl">name:</span> <span class="s1">'DISCORD_MESSAGE'</span><span class="o">,</span> <span class="nl">value:</span> <span class="n">message</span><span class="o">)</span>
    <span class="o">]</span>
<span class="o">}</span>
</code></pre></div></div>

<p>I’ve configured the above job to run every hour on the hour with a cron schedule <code class="language-plaintext highlighter-rouge">0 * * * *</code>. To test this out, I’ve set 
the DNS record to <code class="language-plaintext highlighter-rouge">127.0.0.1</code> and then executed the job. Here’s the truncated output from the Jenkins console:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>🔹 Checking DNS record for minecraft.johnsosoka.com...
[Pipeline] sh
+ dig +short minecraft.johnsosoka.com
+ head -n 1
[Pipeline] sh
+ curl -s https://api.sosoka.io/v1/utility/ip
+ jq -r .ip
[Pipeline] echo
🔹 Current DNS IP: 127.0.0.1
[Pipeline] echo
🔹 Public IP from API: 24.117.184.224
[Pipeline] echo
⚠️ DNS record is outdated. Updating to 24.117.184.224...
[Pipeline] build (Building update-jscom-dns)
Scheduling project: update-jscom-dns
Starting building: update-jscom-dns #15
Build update-jscom-dns #15 completed: SUCCESS
[Pipeline] build (Building notify-discord)
Scheduling project: notify-discord
Starting building: notify-discord #24
Build notify-discord #24 completed: SUCCESS
</code></pre></div></div>

<p>The job successfully detected that the DNS record was outdated and triggered the Update DNS job. The Update DNS job then
successfully updated the DNS record in Route53 and posted a message to my Discord server!</p>

<h2 id="conclusion">Conclusion</h2>

<p>This was a fun weekend project that I’ve been wanting to do for a while, and I’m glad I finally got around to it. I’ll 
be able to re-use much of this infrastructure for other self-hosted services in the future. I may eventually restrict
which domains can be updated by the Lambda function, but for now, I’m keeping it simple as nothing I host is mission-critical. 
Another future improvement will be to host the pipeline DSL in a Jenkinsfile in the GitHub repository for this project, 
instead of directly in the Jenkins job configuration.</p>

<p>Hopefully this post has been helpful to you, and if you have any questions or suggestions, feel free to reach out via the
<a href="https://www.johnsosoka.com/contact/">contact form</a>.</p>

<p>The full code for this project, including the Terraform, can be found on <a href="https://github.com/johnsosoka/jscom-mini-services/tree/main">GitHub</a></p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="dns" /><category term="aws" /><category term="lambda" /><category term="minecraft" /><category term="dynamic-dns" /><category term="dynamic" /><summary type="html"><![CDATA[A few months back, in December, I purchased a Unifi Ubiquity Dream Machine (UDM). This was a significant upgrade from my previous router, and enabled me to segment my home network into multiple VLANs. I typically do not like to expose services from my home network, but with a completely segmented network, I felt comfortable exposing a few services. The first service I wanted to expose publicly was a vanilla Java Minecraft server. I already had a domain name (johnsosoka.com) and wanted to set up dynamic DNS for minecraft.johnsosoka.com.]]></summary></entry><entry><title type="html">Happy New Year!</title><link href="https://www.johnsosoka.com/blog/2025/01/04/happy-new-year.html" rel="alternate" type="text/html" title="Happy New Year!" /><published>2025-01-04T00:00:00-07:00</published><updated>2025-01-04T00:00:00-07:00</updated><id>https://www.johnsosoka.com/blog/2025/01/04/happy-new-year</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2025/01/04/happy-new-year.html"><![CDATA[<p>🎉 Happy New Year!</p>

<p>2024 was a big year for AI, Agents, and me. I’m excited to see what 2025 has in store. I started experimenting with LLMs
via OpenAI in late 2022, and built my first LLM application around mid 2023. Around the start of 2024, I had the wonderful
privilege and opportunity to focus on AI/LLM Development full-time with <a href="https://www.commerce-architects.com/">Commerce Architects</a>.</p>

<p>This past year, building Agents &amp; Multi-Actor systems really began to take off across the industry. While I cannot share 
details on my personal blog, we have shipped some really cool Agent-based applications to production for a handful of 
clients. Our labs team produced an AI Agent that interfaces with the <a href="https://commercetools.com">commercetools</a> platform 
that was named “Accelerator of the Week” by commercetools. I’m really proud of the work we’ve done and the team we’ve built.</p>

<p>I’ve been trying to share more knowledge as I acquire it. From a <a href="https://www.commerce-architects.com/post/the-basic-building-blocks-of-agents">LangChain4J introduction</a> 
to a Python LangGraph <a href="https://github.com/johnsosoka/langgraph-model-router">LLM-based query/model router</a>, I created an
<a href="https://johnsosoka.com/blog/2024/02/10/implement-self-discover.html">example implementation</a> of the DeepMind SELF-DISCOVER
algorithm.</p>

<p>I’m looking forward to learning &amp; sharing more in 2025. I think that we are on the cusp of a knowledge revolution, and this 
year will yield some fascinating new software patterns &amp; breakthroughs around AI &amp; Agent design/orchestration.</p>

<p>Here’s to fun and interesting 2025!</p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="review" /><category term="housekeeping" /><category term="happy-new-year" /><category term="2025" /><category term="2024" /><summary type="html"><![CDATA[🎉 Happy New Year!]]></summary></entry><entry><title type="html">Unit Testing Large Language Models: Agentic Test Evaluation with LangChain4J</title><link href="https://www.johnsosoka.com/blog/2024/07/21/unit-testing-llms.html" rel="alternate" type="text/html" title="Unit Testing Large Language Models: Agentic Test Evaluation with LangChain4J" /><published>2024-07-21T00:00:00-06:00</published><updated>2024-07-21T00:00:00-06:00</updated><id>https://www.johnsosoka.com/blog/2024/07/21/unit-testing-llms</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2024/07/21/unit-testing-llms.html"><![CDATA[<p><img src="https://media.johnsosoka.com/blog/2024-07-21/unit-test-llms.png" alt="unit-test-llms-png" /></p>

<hr />

<p><strong>Note:</strong> This article assumes familiarity with LangChain4j, an LLM Integration framework. For a primer on this library,
you can read an introduction that I wrote <a href="https://www.commerce-architects.com/post/the-basic-building-blocks-of-agents">here</a>.</p>

<hr />

<p>Unit tests are a critical part of enterprise software development. Not only do unit tests help validate the expected
behavior of the code, but they also serve as a form of documentation and give developers the confidence to refactor
and contribute to the codebase. I have worked on software projects lacking unit tests, and have seen the negative
impact on developer confidence &amp; productivity.</p>

<p>Testing Large Language Models (LLMs) is a unique challenge. Particularly because of the non-deterministic nature of 
these models. It isn’t always as simple as asserting that the output of a function is equal to an expected value as 
there can be many ways for an LLM to potentially phrase a correct answer. In today’s post, I will be walking through 
a handful of strategies for unit testing LLMs <em>with</em> LLMs. We will start simple, and then build our way up to
a <code class="language-plaintext highlighter-rouge">MultiPhaseEvaluator</code>, which can guides a test agent through creating a test plan, executing on that plan 
(agent to agent interaction), and then evaluating the results.</p>

<h2 id="setup">Setup</h2>

<p>To evaluate LLM performance in Unit Tests, we’re going to need something to test. To achieve this, I will be 
recreating the Hotel Booking Agent example that I built with Spring AI in a previous article. You can read the original 
blog post <a href="/blog/2024/03/24/Spring-AI.html">here</a>. The project contains a simple hotel booking agent with access to tools to check availability, 
book rooms, and look up reservations.</p>

<p>The first thing I’ve done is copied the existing dummy <a href="https://github.com/johnsosoka/code-examples/blob/main/java/spring-ai-booking/src/main/java/com/johnsosoka/springaibooking/service/HotelBookingService.java">HotelBookingService</a> 
class from the Spring AI project. This class contains the logic for checking availability, booking rooms, and looking up 
reservations. Once copied, I needed to define the LangChain4J toolkit, which will be exposed to the booking agent. It 
simply wraps the <code class="language-plaintext highlighter-rouge">HotelBookingService</code>:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Component</span>
<span class="nd">@RequiredArgsConstructor</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">BookingTools</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">HotelBookingService</span> <span class="n">hotelBookingService</span><span class="o">;</span>
    
    <span class="nd">@Tool</span><span class="o">(</span><span class="s">"Check Availability -- Useful for seeing if a room is available for a given date."</span><span class="o">)</span>
    <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">checkAvailability</span><span class="o">(</span><span class="nc">String</span> <span class="n">date</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">LocalDate</span> <span class="n">parsedDate</span> <span class="o">=</span> <span class="nc">LocalDate</span><span class="o">.</span><span class="na">parse</span><span class="o">(</span><span class="n">date</span><span class="o">);</span>
        <span class="k">return</span> <span class="n">hotelBookingService</span><span class="o">.</span><span class="na">isAvailable</span><span class="o">(</span><span class="n">parsedDate</span><span class="o">);</span>
    <span class="o">}</span>

    <span class="nd">@Tool</span><span class="o">(</span><span class="s">"Book Room -- Useful for booking a room for a given guest name, check-in date, and check-out date."</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">String</span> <span class="nf">bookRoom</span><span class="o">(</span><span class="nc">String</span> <span class="n">guestName</span><span class="o">,</span> <span class="nc">String</span> <span class="n">checkInDate</span><span class="o">,</span> <span class="nc">String</span> <span class="n">checkOutDate</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">LocalDate</span> <span class="n">checkIn</span> <span class="o">=</span> <span class="nc">LocalDate</span><span class="o">.</span><span class="na">parse</span><span class="o">(</span><span class="n">checkInDate</span><span class="o">);</span>
        <span class="nc">LocalDate</span> <span class="n">checkOut</span> <span class="o">=</span> <span class="nc">LocalDate</span><span class="o">.</span><span class="na">parse</span><span class="o">(</span><span class="n">checkOutDate</span><span class="o">);</span>
        <span class="k">return</span> <span class="n">hotelBookingService</span><span class="o">.</span><span class="na">bookRoom</span><span class="o">(</span><span class="n">guestName</span><span class="o">,</span> <span class="n">checkIn</span><span class="o">,</span> <span class="n">checkOut</span><span class="o">);</span>
    <span class="o">}</span>

    <span class="nd">@Tool</span><span class="o">(</span><span class="s">"Find Booking -- Useful for finding a booking by guest name."</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">String</span> <span class="nf">findBooking</span><span class="o">(</span><span class="nc">String</span> <span class="n">guestName</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">return</span> <span class="n">hotelBookingService</span><span class="o">.</span><span class="na">findBookingByGuestNameStr</span><span class="o">(</span><span class="n">guestName</span><span class="o">);</span>
    <span class="o">}</span>

<span class="o">}</span>
</code></pre></div></div>
<p><br />
Next up, I’ll define the LangChain4J <a href="https://docs.langchain4j.dev/tutorials/ai-services/">AIService</a>. This class will define 
the role of the agent, as well as an entrypoint to interface with the LLM. Furthermore, we can easily attach this to a
<code class="language-plaintext highlighter-rouge">@Tool</code> exposing it to the <code class="language-plaintext highlighter-rouge">HotelBookingAgent</code> which is to be tested.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">package</span> <span class="nn">com.johnsosoka.langchainbookingtests.agent</span><span class="o">;</span>

<span class="kn">import</span> <span class="nn">dev.langchain4j.service.SystemMessage</span><span class="o">;</span>

<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">BookingAgent</span> <span class="o">{</span>

    <span class="nd">@SystemMessage</span><span class="o">({</span>
            <span class="s">"You are a booking agent for an online hotel. You are here to help customers book rooms and check "</span><span class="o">,</span>
            <span class="s">"availability. Use the tools you have access to in order to help customers with their requests. You can "</span><span class="o">,</span>
            <span class="s">"check availability, book rooms, and find bookings."</span>
    <span class="o">})</span>
    <span class="nc">String</span> <span class="nf">chat</span><span class="o">(</span><span class="nc">String</span> <span class="n">message</span><span class="o">);</span>
<span class="o">}</span>

</code></pre></div></div>
<p><br />
In a Spring configuration class, we will equip the agent with a toolkit, large language model (GPT-4o), and a ChatMemory.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Configuration</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">BookingAgentConfig</span> <span class="o">{</span>

    <span class="nd">@Value</span><span class="o">(</span><span class="s">"${openai.api-key}"</span><span class="o">)</span>
    <span class="nc">String</span> <span class="n">apiKey</span><span class="o">;</span>

    <span class="nd">@Bean</span>
    <span class="kd">public</span> <span class="nc">ChatLanguageModel</span> <span class="nf">chatLanguageModel</span><span class="o">()</span> <span class="o">{</span>
        <span class="k">return</span> <span class="nc">OpenAiChatModel</span><span class="o">.</span><span class="na">builder</span><span class="o">()</span>
                <span class="o">.</span><span class="na">modelName</span><span class="o">(</span><span class="nc">OpenAiChatModelName</span><span class="o">.</span><span class="na">GPT_4_O</span><span class="o">)</span>
                <span class="o">.</span><span class="na">apiKey</span><span class="o">(</span><span class="n">apiKey</span><span class="o">)</span>
                <span class="o">.</span><span class="na">build</span><span class="o">();</span>
    <span class="o">}</span>

    <span class="nd">@Bean</span>
    <span class="kd">public</span> <span class="nc">BookingAgent</span> <span class="nf">bookingAgent</span><span class="o">(</span><span class="nc">BookingTools</span> <span class="n">bookingTools</span><span class="o">,</span> <span class="nc">ChatLanguageModel</span> <span class="n">chatLanguageModel</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">return</span> <span class="nc">AiServices</span><span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="nc">BookingAgent</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>
                <span class="o">.</span><span class="na">chatLanguageModel</span><span class="o">(</span><span class="n">chatLanguageModel</span><span class="o">)</span>
                <span class="o">.</span><span class="na">tools</span><span class="o">(</span><span class="n">bookingTools</span><span class="o">)</span>
                <span class="o">.</span><span class="na">chatMemory</span><span class="o">(</span><span class="nc">MessageWindowChatMemory</span><span class="o">.</span><span class="na">withMaxMessages</span><span class="o">(</span><span class="mi">50</span><span class="o">))</span>
                <span class="o">.</span><span class="na">build</span><span class="o">();</span>
    <span class="o">}</span>

<span class="o">}</span>
</code></pre></div></div>
<p><br />
Finally, I will create an additional service class that will be used to interact with the agent. Remember, we’re just
setting up a dummy application so that we have something to test–This is not a production-ready application, and as
such will not support concurrent conversations.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">package</span> <span class="nn">com.johnsosoka.langchainbookingtests.service</span><span class="o">;</span>

<span class="kn">import</span> <span class="nn">com.johnsosoka.langchainbookingtests.agent.BookingAgent</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">lombok.RequiredArgsConstructor</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.springframework.stereotype.Service</span><span class="o">;</span>

<span class="nd">@Service</span>
<span class="nd">@RequiredArgsConstructor</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">ChatService</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">BookingAgent</span> <span class="n">bookingAgent</span><span class="o">;</span>

    <span class="kd">public</span> <span class="nc">String</span> <span class="nf">chat</span><span class="o">(</span><span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">return</span> <span class="n">bookingAgent</span><span class="o">.</span><span class="na">chat</span><span class="o">(</span><span class="n">message</span><span class="o">);</span>
    <span class="o">}</span>

<span class="o">}</span>
</code></pre></div></div>
<p><br />
The SpringAI HotelBookingAgent has now been migrated to LangChain4J! We can now begin writing unit tests for the agent.</p>

<h2 id="unit-testing">Unit Testing</h2>

<p>The <code class="language-plaintext highlighter-rouge">HotelBookingService</code> has two hardcoded dates: January 15, 2025 (available) and February 28, 2025 (unavailable). We can
use these dates to test the agent’s ability to check availability, book rooms, and find bookings.</p>

<h3 id="testing-without-agents">Testing Without Agents</h3>

<p>To begin, I’ll set up an integration test for the <code class="language-plaintext highlighter-rouge">ChatService</code>, and evaluate the response using <code class="language-plaintext highlighter-rouge">contains</code> to assert that
the agent’s response contains the expected output.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@SpringBootTest</span>
<span class="nd">@Slf4j</span>
<span class="kd">class</span> <span class="nc">ChatServiceTestIT</span> <span class="o">{</span>

    <span class="nd">@Autowired</span>
    <span class="kd">private</span> <span class="nc">ChatService</span> <span class="n">chatService</span><span class="o">;</span>

    <span class="nd">@Test</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">checkAvailability</span><span class="o">()</span> <span class="o">{</span>
        <span class="nc">String</span> <span class="n">response</span> <span class="o">=</span> <span class="n">chatService</span><span class="o">.</span><span class="na">chat</span><span class="o">(</span><span class="s">"Is the hotel available on 2022-12-12?"</span><span class="o">);</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Response: {}"</span><span class="o">,</span> <span class="n">response</span><span class="o">);</span>
        <span class="n">assertTrue</span><span class="o">(</span><span class="n">response</span><span class="o">.</span><span class="na">contains</span><span class="o">(</span><span class="s">"not available"</span><span class="o">));</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p><br />
When this test executes, the agent will respond with a message indicating that the hotel is not available on the given date.
Here are sample outputs from three different execution runs of the test:</p>

<pre><code class="language-commandline">Response: The hotel is not available on 2022-12-12. Would you like to check for other dates or make a 
booking for different dates?
</code></pre>
<p><br /></p>
<pre><code class="language-commandline">Response: The hotel is not available on 2022-12-12. Would you like to check for alternative dates or 
make a booking for a different date?
</code></pre>
<p><br /></p>
<pre><code class="language-commandline">Response: I'm sorry, but the hotel is not available on 2022-12-12. Is there another date you would 
like to check for availability?
</code></pre>
<p><br />
You can see that the agent’s response can vary slightly, due to the non-deterministic nature of the language model. While we 
could potentially assert that the response contains the words “not available,” this would be a brittle test. Instead, we can
use a more robust approach by creating an LLM Evaluator agent.</p>

<h3 id="simple-agent-based-evaluation">Simple Agent-Based Evaluation</h3>

<p>We can use an agent-based approach to evaluate the agent’s responses. This approach involves creating an agent that can 
be tasked with evaluating the responses of another agent. The evaluator agent will be provided with the conditions that
the response must meet as well as the response itself to evaluate.</p>

<p>Let’s first define the <code class="language-plaintext highlighter-rouge">TestEvaluationAgent</code> interface:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">TestEvaluationAgent</span> <span class="o">{</span>

    <span class="nd">@SystemMessage</span><span class="o">({</span>
            <span class="s">"You purpose is to evaluate the results of a test. You will be employed in a unit testing environment, "</span><span class="o">,</span>
            <span class="s">"and must critically evaluate the provided conditions and results to determine if the test has passed or "</span><span class="o">,</span>
            <span class="s">"failed. Consider a passing test True, and a failing test False."</span>
    <span class="o">})</span>
    <span class="nd">@UserMessage</span><span class="o">({</span>
            <span class="s">"Evaluate the following:\n"</span><span class="o">,</span>
            <span class="s">"Condition: {{condition}}\n"</span><span class="o">,</span>
            <span class="s">"-----\n"</span><span class="o">,</span>
            <span class="s">"Results: {{result}}"</span><span class="o">,</span>
    <span class="o">})</span>
    <span class="kd">public</span> <span class="nc">Boolean</span> <span class="nf">evaluate</span><span class="o">(</span><span class="nd">@V</span><span class="o">(</span><span class="s">"condition"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">condition</span><span class="o">,</span> <span class="nd">@V</span><span class="o">(</span><span class="s">"result"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">result</span><span class="o">);</span>
<span class="o">}</span>

</code></pre></div></div>
<p><br />
In the above, you can see how we’re defining the “profile” or “role” of the <code class="language-plaintext highlighter-rouge">TestEvaluationAgent</code>. The <code class="language-plaintext highlighter-rouge">@SystemMessage</code>
annotation clearly explains to the LLM what its purpose is. The <code class="language-plaintext highlighter-rouge">@UserMessage</code> annotation provides a template for the
agent to use when evaluating the results.</p>

<p>For a quick test, let’s wire up this agent to evaluate the response of the <code class="language-plaintext highlighter-rouge">ChatService</code> test we wrote earlier:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">...</span>
    <span class="nd">@Autowired</span>
    <span class="kd">private</span> <span class="nc">ChatLanguageModel</span> <span class="n">chatLanguageModel</span><span class="o">;</span>

    <span class="kd">private</span> <span class="nc">TestEvaluationAgent</span> <span class="n">testEvaluationAgent</span><span class="o">;</span>

    <span class="nd">@BeforeEach</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">setUp</span><span class="o">(){</span>
        <span class="n">testEvaluationAgent</span> <span class="o">=</span> <span class="n">provisionEvaluationAgent</span><span class="o">();</span>
    <span class="o">}</span>

    <span class="nd">@Test</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">checkAvailability_withTestEvaluationAgent</span><span class="o">()</span> <span class="o">{</span>
        <span class="nc">String</span> <span class="n">response</span> <span class="o">=</span> <span class="n">chatService</span><span class="o">.</span><span class="na">chat</span><span class="o">(</span><span class="s">"Is the hotel available on 2025-02-28?"</span><span class="o">);</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Response: {}"</span><span class="o">,</span> <span class="n">response</span><span class="o">);</span>

        <span class="nc">String</span> <span class="n">condition</span> <span class="o">=</span> <span class="s">"It should be determined that there are no hotel rooms available on 2025-02-28"</span><span class="o">;</span>
        <span class="nc">Boolean</span> <span class="n">evaluationResult</span> <span class="o">=</span> <span class="n">testEvaluationAgent</span><span class="o">.</span><span class="na">evaluate</span><span class="o">(</span><span class="n">condition</span><span class="o">,</span> <span class="n">response</span><span class="o">);</span>
        <span class="n">assertTrue</span><span class="o">(</span><span class="n">evaluationResult</span><span class="o">);</span>
    <span class="o">}</span>


    <span class="kd">private</span> <span class="nc">TestEvaluationAgent</span> <span class="nf">provisionEvaluationAgent</span><span class="o">()</span> <span class="o">{</span>
        <span class="k">return</span> <span class="nc">AiServices</span><span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="nc">TestEvaluationAgent</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>
                <span class="o">.</span><span class="na">chatLanguageModel</span><span class="o">(</span><span class="n">chatLanguageModel</span><span class="o">)</span>
                <span class="o">.</span><span class="na">chatMemory</span><span class="o">(</span><span class="nc">MessageWindowChatMemory</span><span class="o">.</span><span class="na">withMaxMessages</span><span class="o">(</span><span class="mi">10</span><span class="o">))</span>
                <span class="o">.</span><span class="na">build</span><span class="o">();</span>
    <span class="o">}</span>
<span class="o">...</span>
</code></pre></div></div>
<p><br />
The above test will pass the response of the <code class="language-plaintext highlighter-rouge">BookingAgent</code> to the <code class="language-plaintext highlighter-rouge">TestEvaluationAgent</code> along with the conditions of 
satisfaction for evaluation. The <code class="language-plaintext highlighter-rouge">TestEvaluationAgent</code> will then evaluate the response and return a boolean value, True
if the response meets the conditions, and False if it does not.</p>

<p>You may have noticed that the <code class="language-plaintext highlighter-rouge">TestEvaluationAgent</code> is provisioned using the existing <code class="language-plaintext highlighter-rouge">ChatLanguageModel</code> defined in the
Spring configuration from earlier. It is worth noting that developers are not limited to re-using models. There are new
fine-tuned models being released frequently that may be better suited for specific tasks like hallucination detection or
critiquing.</p>

<h3 id="multi-pass-agent-evaluation">Multi-Pass Agent Evaluation</h3>

<p>Our <code class="language-plaintext highlighter-rouge">TestEvaluationAgent</code> is also subject to the non-deterministic nature of the language model. To mitigate this, we can
use a multi-pass evaluation strategy. This strategy will involve evaluating the result multiple times and taking the
majority vote as the final result.</p>

<p>Below is a simple implementation of the multi-pass evaluation strategy:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Builder</span>
<span class="nd">@Slf4j</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MultiPassEvaluator</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="nc">TestEvaluationAgent</span> <span class="n">testEvaluationAgent</span><span class="o">;</span>
    <span class="c1">// The total number of times to evaluate the result</span>
    <span class="kd">private</span> <span class="nc">Integer</span> <span class="n">passCount</span><span class="o">;</span>

    <span class="kd">public</span> <span class="nc">Boolean</span> <span class="nf">evaluate</span><span class="o">(</span><span class="nc">String</span> <span class="n">condition</span><span class="o">,</span> <span class="nc">String</span> <span class="n">result</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">Boolean</span> <span class="n">evaluationResult</span> <span class="o">=</span> <span class="kc">false</span><span class="o">;</span>
        <span class="kt">int</span> <span class="n">successCount</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
        <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">passCount</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
            <span class="kt">boolean</span> <span class="n">evaluation</span> <span class="o">=</span> <span class="n">testEvaluationAgent</span><span class="o">.</span><span class="na">evaluate</span><span class="o">(</span><span class="n">condition</span><span class="o">,</span> <span class="n">result</span><span class="o">);</span>
            <span class="k">if</span> <span class="o">(</span><span class="n">evaluation</span><span class="o">)</span> <span class="o">{</span>
                <span class="n">successCount</span><span class="o">++;</span>
                <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Evaluation {} passed"</span><span class="o">,</span> <span class="n">i</span><span class="o">);</span>
            <span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
                <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Evaluation {} failed"</span><span class="o">,</span> <span class="n">i</span><span class="o">);</span>
            <span class="o">}</span>
        <span class="o">}</span>
        <span class="c1">// If more than half of the evaluations are successful, then the test is considered successful</span>
        <span class="k">return</span> <span class="n">successCount</span> <span class="o">&gt;=</span> <span class="n">passCount</span> <span class="o">/</span> <span class="mi">2</span><span class="o">;</span>
    <span class="o">}</span>

<span class="o">}</span>
</code></pre></div></div>
<p><br />
I’ll wire this up to another test:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">...</span>
<span class="nd">@BeforeEach</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setUp</span><span class="o">(){</span>
    <span class="n">testEvaluationAgent</span> <span class="o">=</span> <span class="n">provisionEvaluationAgent</span><span class="o">();</span>
    <span class="n">multiPassEvaluator</span> <span class="o">=</span> <span class="nc">MultiPassEvaluator</span><span class="o">.</span><span class="na">builder</span><span class="o">()</span>
            <span class="o">.</span><span class="na">testEvaluationAgent</span><span class="o">(</span><span class="n">testEvaluationAgent</span><span class="o">)</span>
            <span class="o">.</span><span class="na">passCount</span><span class="o">(</span><span class="mi">3</span><span class="o">)</span>
            <span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="o">}</span>

<span class="nd">@Test</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">checkAvailability_withMultiPassEvaluator</span><span class="o">()</span> <span class="o">{</span>
    <span class="nc">String</span> <span class="n">response</span> <span class="o">=</span> <span class="n">chatService</span><span class="o">.</span><span class="na">chat</span><span class="o">(</span><span class="s">"Is the hotel available on 2025-02-28?"</span><span class="o">);</span>
    <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Response: {}"</span><span class="o">,</span> <span class="n">response</span><span class="o">);</span>

    <span class="nc">String</span> <span class="n">condition</span> <span class="o">=</span> <span class="s">"It should be determined that there are no hotel rooms available on 2025-02-28"</span><span class="o">;</span>
    <span class="nc">Boolean</span> <span class="n">evaluationResult</span> <span class="o">=</span> <span class="n">multiPassEvaluator</span><span class="o">.</span><span class="na">evaluate</span><span class="o">(</span><span class="n">condition</span><span class="o">,</span> <span class="n">response</span><span class="o">);</span>
    <span class="n">assertTrue</span><span class="o">(</span><span class="n">evaluationResult</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">...</span>
</code></pre></div></div>
<p><br />
Here is the output from a test run:</p>

<pre><code class="language-commandline">2024-07-21T17:05:35.367-06:00  INFO c.j.l.service.ChatServiceTestIT          : Response: The hotel is not available on 2025-02-28. If you would like to check availability for another date or have any other requests, please let me know!
2024-07-21T17:05:35.925-06:00  INFO c.j.l.helper.MultiPassEvaluator          : Evaluation 0 failed
2024-07-21T17:05:36.430-06:00  INFO c.j.l.helper.MultiPassEvaluator          : Evaluation 1 passed
2024-07-21T17:05:36.829-06:00  INFO c.j.l.helper.MultiPassEvaluator          : Evaluation 2 passed
</code></pre>
<p><br /></p>

<p>Interestingly enough, the evaluation failed on the first pass, but passed on the following two passes. This is largely 
why we use a multi-pass evaluation strategy. It helps to mitigate the non-deterministic nature of the LLM tasked with
evaluating the results. In a production environment, you may want to increase the number of passes and potentially 
tweak the temperature of the underlying <code class="language-plaintext highlighter-rouge">ChatLanguageModel</code> to improve evaluation accuracy.</p>

<h3 id="multi-phase-agent-evaluation-plan-test--evaluate">Multi-Phase Agent Evaluation (Plan, Test &amp; Evaluate)</h3>

<p>The final strategy that I’ll cover in this article is Multi-Phase Agent Evaluation. With this strategy, instead of performing 
the same evaluation multiple times, we will instead guide an agent through multiple phases: Planning, Execution &amp; Evaluation.</p>

<p>We will continue utilizing an LLM to evaluate our <code class="language-plaintext highlighter-rouge">BookingAgent</code> LLM, which is exposed via the <code class="language-plaintext highlighter-rouge">ChatService</code>. This Agent will 
be provided a description for the expected behavior of the system, and it will both generate a test plan
and execute on that plan.</p>

<p>The <code class="language-plaintext highlighter-rouge">TestAgent</code> will be able to interact with the <code class="language-plaintext highlighter-rouge">BookingAgent</code> by exposing it as a <code class="language-plaintext highlighter-rouge">@Tool</code> to the QA agent. The <code class="language-plaintext highlighter-rouge">TestAgent</code>
will then be able to chat with the <code class="language-plaintext highlighter-rouge">BookingAgent</code> like a customer would.</p>

<p>First, we’ll wrap the <code class="language-plaintext highlighter-rouge">ChatService</code> in a <code class="language-plaintext highlighter-rouge">BookingAgentTool</code>:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Component</span>
<span class="nd">@RequiredArgsConstructor</span>
<span class="nd">@Slf4j</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">BookingAgentTool</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">ChatService</span> <span class="n">chatService</span><span class="o">;</span>

    <span class="nd">@Tool</span><span class="o">(</span><span class="s">"Interact with the Booking Agent -- Useful for testing the Booking Agent system"</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">String</span> <span class="nf">interactWithBookingAgent</span><span class="o">(</span><span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"QA Agent Message: {}"</span><span class="o">,</span> <span class="n">message</span><span class="o">);</span>
        <span class="nc">String</span> <span class="n">response</span> <span class="o">=</span> <span class="n">chatService</span><span class="o">.</span><span class="na">chat</span><span class="o">(</span><span class="n">message</span><span class="o">);</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Booking Agent Response: {}"</span><span class="o">,</span> <span class="n">response</span><span class="o">);</span>
        <span class="k">return</span> <span class="n">response</span><span class="o">;</span>
    <span class="o">}</span>

<span class="o">}</span>
</code></pre></div></div>
<p><br />
By exposing the <code class="language-plaintext highlighter-rouge">ChatService</code> (and by extension the <code class="language-plaintext highlighter-rouge">BookingAgent</code>) as a <code class="language-plaintext highlighter-rouge">@Tool</code>, any agent equipped with the <code class="language-plaintext highlighter-rouge">BookingAgentTool</code>
component, will be able to interact with the <code class="language-plaintext highlighter-rouge">BookingAgent</code> as though it were a customer or QA tester.</p>

<p>Next, we will define and create several methods encapsulating the different phases our <code class="language-plaintext highlighter-rouge">TestAgent</code> will be guided through.</p>

<p>We will define a method and prompt to:</p>

<ul>
  <li>Generate a test plan</li>
  <li>Execute the test plan</li>
  <li>Evaluate the test results</li>
</ul>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">TestAgent</span> <span class="o">{</span>

    <span class="nd">@SystemMessage</span><span class="o">({</span>
            <span class="s">"You are a world class QA engineer, your job is to test the system and ensure that it is working as expected."</span><span class="o">,</span>
            <span class="s">"You will be provided with a test plan, and it is your job to execute each test case individually and determine"</span><span class="o">,</span>
            <span class="s">"if the system is working as expected."</span><span class="o">,</span>
            <span class="s">"You will act as a customer interacting with a chatbot system to test the system's behavior."</span><span class="o">,</span>
    <span class="o">})</span>
    <span class="kd">public</span> <span class="nc">String</span> <span class="nf">test</span><span class="o">(</span><span class="nc">String</span> <span class="n">testCases</span><span class="o">);</span>

    <span class="nd">@SystemMessage</span><span class="o">({</span>
            <span class="s">"You are a world class QA engineer, your job is to test the system and ensure that it is working as expected."</span><span class="o">,</span>
            <span class="s">"You will be provided with an explanation of the System's behavior and you must carefully write test cases to"</span><span class="o">,</span>
            <span class="s">"ensure that the system meets the expected behavior. Your test cases should be a detailed description for usage"</span><span class="o">,</span>
            <span class="s">"by a different language model."</span><span class="o">,</span>
            <span class="s">"The System being tested is another Large Language Model, so the inputs and expected outputs can be in natural language."</span><span class="o">,</span>
            <span class="s">"Account for this possible variability in the rigidity of evaluation criteria."</span>
    <span class="o">})</span>
    <span class="nd">@UserMessage</span><span class="o">({</span>
            <span class="s">"Write test cases for the following system behavior:\n"</span><span class="o">,</span>
            <span class="s">"System Behavior: {{systemBehavior}}\n"</span>
    <span class="o">})</span>
    <span class="kd">public</span> <span class="nc">String</span> <span class="nf">writeTestCases</span><span class="o">(</span><span class="nd">@V</span><span class="o">(</span><span class="s">"systemBehavior"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">systemBehavior</span><span class="o">);</span>

    <span class="nd">@SystemMessage</span><span class="o">({</span>
            <span class="s">"You must carefully evaluate the results of the test plan to determine if the system is working as expected."</span><span class="o">,</span>
            <span class="s">"In the event of any failures, the result should be false. Otherwise, the result should be true."</span>
    <span class="o">})</span>
    <span class="nd">@UserMessage</span><span class="o">(</span><span class="s">"Evaluate the following test execution results: {{it}}"</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">Boolean</span> <span class="nf">evaluateResults</span><span class="o">(</span><span class="nc">String</span> <span class="n">testResults</span><span class="o">);</span>

<span class="o">}</span>

</code></pre></div></div>
<p><br /></p>

<p>Finally, we will create an <code class="language-plaintext highlighter-rouge">MultiPhaseEvaluator</code> class which will handle the flow-control of the <code class="language-plaintext highlighter-rouge">TestAgent</code>:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@RequiredArgsConstructor</span>
<span class="nd">@Slf4j</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MultiPhaseEvaluator</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">TestAgent</span> <span class="n">testAgent</span><span class="o">;</span>

    <span class="cm">/**
     * Generates a test plan, executes the test plan, and evaluates the results for a given system description.
     * @param systemDescription
     * @return
     */</span>
    <span class="kd">public</span> <span class="nc">TestPlanResult</span> <span class="nf">generateAndExecuteTestPlan</span><span class="o">(</span><span class="nc">String</span> <span class="n">systemDescription</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">String</span> <span class="n">testCases</span> <span class="o">=</span> <span class="n">testAgent</span><span class="o">.</span><span class="na">writeTestCases</span><span class="o">(</span><span class="n">systemDescription</span><span class="o">);</span>
        <span class="nc">String</span> <span class="n">testPlanResults</span> <span class="o">=</span> <span class="n">testAgent</span><span class="o">.</span><span class="na">test</span><span class="o">(</span><span class="n">testCases</span><span class="o">);</span>
        <span class="nc">Boolean</span> <span class="n">testPlanResult</span> <span class="o">=</span> <span class="n">testAgent</span><span class="o">.</span><span class="na">evaluateResults</span><span class="o">(</span><span class="n">testPlanResults</span><span class="o">);</span>
        <span class="k">return</span> <span class="nc">TestPlanResult</span><span class="o">.</span><span class="na">builder</span><span class="o">()</span>
                <span class="o">.</span><span class="na">testPlan</span><span class="o">(</span><span class="n">testCases</span><span class="o">)</span>
                <span class="o">.</span><span class="na">testPlanResults</span><span class="o">(</span><span class="n">testPlanResults</span><span class="o">)</span>
                <span class="o">.</span><span class="na">allTestsPassed</span><span class="o">(</span><span class="n">testPlanResult</span><span class="o">)</span>
                <span class="o">.</span><span class="na">build</span><span class="o">();</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p><br />
Notice above that we’re passing the output from one LLM invocation to the next. This flow-control allows us to guide LLMs 
with task-specific prompts through a series of logical steps.</p>

<p>I’ve created helper methods to provision the <code class="language-plaintext highlighter-rouge">TestAgent</code> and <code class="language-plaintext highlighter-rouge">MultiPhaseEvaluator</code> class, you can view this in the complete example
on <a href="https://github.com/johnsosoka/code-examples/blob/main/java/langchain-booking-tests/src/test/java/com/johnsosoka/langchainbookingtests/service/ChatServiceTestIT.java">Github</a>. 
The important part is seeing this added to the test:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nd">@Test</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">testPlanCreationTest</span><span class="o">()</span> <span class="o">{</span>
        <span class="nc">String</span> <span class="n">systemDescription</span> <span class="o">=</span> <span class="sh">"""
                The system is a simple hotel booking agent. The agent should have the ability to:
                - Check the availability of a hotel room for a given date
                - Book a hotel room for a guest (check in &amp; check out date required)
                - Lookup a booking by guest name
                
                The system has the following preconditions:
                - The system has a hotel with 1 room available on 2025-01-15
                - The system has a hotel with 0 rooms available on 2025-02-28
                - All other dates should be considered unavailable
                """</span><span class="o">;</span>
        
        <span class="nc">TestPlanResult</span> <span class="n">testPlanResult</span> <span class="o">=</span> <span class="n">multiPhaseEvaluator</span><span class="o">.</span><span class="na">generateAndExecuteTestPlan</span><span class="o">(</span><span class="n">systemDescription</span><span class="o">);</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Test Plan: \n{}"</span><span class="o">,</span> <span class="n">testPlanResult</span><span class="o">.</span><span class="na">getTestPlan</span><span class="o">());</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Test Plan Results: \n{}"</span><span class="o">,</span> <span class="n">testPlanResult</span><span class="o">.</span><span class="na">getTestPlanResults</span><span class="o">());</span>
        <span class="n">assertTrue</span><span class="o">(</span><span class="n">testPlanResult</span><span class="o">.</span><span class="na">getAllTestsPassed</span><span class="o">());</span>
    <span class="o">}</span>
</code></pre></div></div>
<p><br /></p>

<p>You can see that we’re describing the capabilities of the system that’s going to be evaluated. Remember, we’re equipping
the <code class="language-plaintext highlighter-rouge">TestAgent</code> with a tool that allows it to interact with the <code class="language-plaintext highlighter-rouge">BookingAgent</code> via the <code class="language-plaintext highlighter-rouge">ChatService</code>. The <code class="language-plaintext highlighter-rouge">MultiPhaseEvaluator</code>
will guide the <code class="language-plaintext highlighter-rouge">TestAgent</code> through writing test cases, executing on those test cases, and evaluating the results.</p>

<p>Below is the conversation between the QA Agent and the Booking Agent:</p>

<pre><code class="language-commandline">QA Agent Message - Check availability for 2025-01-15
Booking Agent Response - There is availability for a room on January 15, 2025. Would you like to proceed with a booking? If so, please provide the guest name, check-in date, and check-out date.
QA Agent Message - Check availability for 2025-02-28
Booking Agent Response - Unfortunately, there is no availability for a room on February 28, 2025. Would you like me to check another date or assist you with anything else?
QA Agent Message - Check availability for 2025-03-10
Booking Agent Response - There is no availability for a room on March 10, 2025. Would you like me to check another date or help with something else?
QA Agent Message - Book a room for John Doe from 2025-01-15 to 2025-01-16
Booking Agent Response - The room has been successfully booked for John Doe from January 15, 2025, to January 16, 2025. If there's anything else you need assistance with, please let me know!
QA Agent Message - Book a room for Jane Smith from 2025-02-28 to 2025-03-01
Booking Agent Response - There is no availability on February 28, 2025, so I'm unable to book a room for Jane Smith on those dates. Would you like me to check dates around that period or assist you with something else?
QA Agent Message - Book a room for Alice Brown from 2025-03-10 to 2025-03-11
Booking Agent Response - There is no availability on March 10, 2025, so I'm unable to book a room for Alice Brown on those dates. Would you like me to check dates around that period or assist you with something else?
QA Agent Message - Lookup booking for John Doe
Booking Agent Response - I found a booking for John Doe with the following details:
- **Check-in Date:** January 15, 2025
- **Check-out Date:** January 16, 2025

Is there anything else you need assistance with?
QA Agent Message - Lookup booking for Jane Smith
Booking Agent Response - There is no booking found for Jane Smith. If you need help with something else or would like to make a new booking, please let me know!
QA Agent Message - Book a room for Mark Taylor from 2025-01-16 to 2025-01-15
Booking Agent Response - It looks like the check-out date is before the check-in date. Please provide the correct check-in and check-out dates so I can proceed with the booking for Mark Taylor.
QA Agent Message - Book a room for Emily Davis on 2025-01-15
Booking Agent Response - Could you please provide the check-out date for Emily Davis's booking on January 15, 2025, so I can proceed with the reservation?
</code></pre>
<p><br /></p>

<p>This is pretty interesting stuff, the <code class="language-plaintext highlighter-rouge">TestAgent</code> is able to generate a test plan &amp; interact with the booking agent to execute
on that plan. The abridged version of the plan generated for the above execution run was:</p>

<ol>
  <li>Check availability for a date with available rooms</li>
  <li>Check availability for a date with no available rooms</li>
  <li>Check availability for a date not listed in preconditions</li>
  <li>Book a room for a guest with available dates</li>
  <li>Book a room for a guest with no available rooms</li>
  <li>Book a room for a guest with dates not listed in preconditions</li>
  <li>Lookup a booking by guest name that exists</li>
  <li>Lookup a booking by guest name that does not exist</li>
  <li>Book a room for a guest with check-out date before check-in date</li>
  <li>Book a room for a guest with check-in and check-out date being the same</li>
</ol>

<p>We could feasibly combine the multi-pass evaluation strategy with the agentic QA evaluation strategy to further improve
the accuracy and robustness of our tests; But, that is a project for another day.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Testing Large Language Models is a new and unique challenge. I’m really excited to see what other ideas the industry comes 
up with in the future. We covered quite a bit of ground today, starting with a simple unit test using <code class="language-plaintext highlighter-rouge">contains</code> and working 
our way to a fully functional <code class="language-plaintext highlighter-rouge">MultiPhaseEvaluator</code> capable of generating test plans and executing on them. Hopefully, this article has
given you some ideas on how to test your own LLM applications. This was a fun project to work on, and I hope you 
found it as interesting as I did. Watching the two agents interact with each other was thrilling, and being able to use a junit 
assertion to evaluate the results was the cherry on top.</p>

<p>The complete example can be found on my GitHub <a href="https://github.com/johnsosoka/code-examples/tree/main/java/langchain-booking-tests">here</a></p>

<p>Happy coding!</p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="java" /><category term="langchain4j" /><category term="testing" /><category term="unit-testing" /><category term="junit" /><category term="mockito" /><category term="large-language-models" /><category term="llm" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">The Basic Building Blocks of Agents</title><link href="https://www.johnsosoka.com/blog/2024/05/23/agent-building-blocks.html" rel="alternate" type="text/html" title="The Basic Building Blocks of Agents" /><published>2024-05-23T00:00:00-06:00</published><updated>2024-05-23T00:00:00-06:00</updated><id>https://www.johnsosoka.com/blog/2024/05/23/agent-building-blocks</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2024/05/23/agent-building-blocks.html"><![CDATA[<p>I recently published a <a href="https://www.commerce-architects.com/post/the-basic-building-blocks-of-agents">blog post</a> for Commerce Architects 
introducing the basic building blocks for creating Agents/Agentic workflows in Java with LangChain4J. If you want a simple introduction 
to building these types of applications, this is it! In future articles, I’ll be exploring the design &amp; architecture of these thinking 
machines in more depth. Let me know what you think!</p>

<p>Read the Full Article Here: <a href="https://www.commerce-architects.com/post/the-basic-building-blocks-of-agents">The Basic Building Blocks of Agents</a></p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="LLM" /><category term="Agent" /><category term="Basics" /><category term="LangChain4J" /><category term="Agentic" /><summary type="html"><![CDATA[I recently published a blog post for Commerce Architects introducing the basic building blocks for creating Agents/Agentic workflows in Java with LangChain4J. If you want a simple introduction to building these types of applications, this is it! In future articles, I’ll be exploring the design &amp; architecture of these thinking machines in more depth. Let me know what you think!]]></summary></entry><entry><title type="html">Exploring Spring AI: Building a Simple Hotel Booking Agent</title><link href="https://www.johnsosoka.com/blog/2024/03/24/Spring-AI.html" rel="alternate" type="text/html" title="Exploring Spring AI: Building a Simple Hotel Booking Agent" /><published>2024-03-24T00:00:00-06:00</published><updated>2024-03-24T00:00:00-06:00</updated><id>https://www.johnsosoka.com/blog/2024/03/24/Spring-AI</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2024/03/24/Spring-AI.html"><![CDATA[<p><img src="https://media.johnsosoka.com/blog/2024-03-24/spring-ai-article.png" alt="header-img" /></p>

<p>I recently came across the <a href="https://docs.spring.io/spring-ai/reference/index.html">Spring AI</a> project, which “aims to 
streamline the development of applications that incorporate artificial intelligence functionality without unnecessary complexity.”</p>

<p>Thus far, I’ve been relying on the <a href="https://github.com/langchain4j/langchain4j/">LangChain4J</a> Framework for my AI projects, but as a Java developer &amp; Spring enthusiast, 
I was excited to see what Spring AI had to offer. Unfortunately, at the time of writing, the latest stable release <code class="language-plaintext highlighter-rouge">0.8.1</code>
does not support function calling, which is critical for most advanced use cases. As such, I will be working using the unstable 
<code class="language-plaintext highlighter-rouge">1.0.0-SNAPSHOT</code>. Function calls enable the Agent to “interact” with the rest of our software &amp; 3rd party services–The
framework will intercept a tool invocation request &amp; call the appropriate method defined in the callback.</p>

<p><strong>Today’s Project:</strong> Today, we will be building a simple Spring AI agent that will help manage a dummy hotel booking system.
It will be able to check availability, book rooms, and look up bookings by guest name. Function calls will be used to expose
these capabilities to the Agent.</p>

<p>I’m going to keep this project simple and focus more on utilizing the Spring AI framework rather than building a bullet-proof,
production ready Agent. The complete code will be available on <a href="https://github.com/johnsosoka/code-examples/tree/main/java/spring-ai-booking">GitHub</a></p>

<h2 id="getting-started">Getting Started</h2>

<h3 id="dependencies">Dependencies</h3>

<p>Before adding the Spring AI dependency to your project, you will need to add the Spring AI Snapshot repository to your <code class="language-plaintext highlighter-rouge">pom.xml</code> file.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nt">&lt;repositories&gt;</span>
        <span class="nt">&lt;repository&gt;</span>
            <span class="nt">&lt;id&gt;</span>spring-snapshots<span class="nt">&lt;/id&gt;</span>
            <span class="nt">&lt;name&gt;</span>Spring Snapshots<span class="nt">&lt;/name&gt;</span>
            <span class="nt">&lt;url&gt;</span>https://repo.spring.io/snapshot<span class="nt">&lt;/url&gt;</span>
            <span class="nt">&lt;releases&gt;</span>
                <span class="nt">&lt;enabled&gt;</span>false<span class="nt">&lt;/enabled&gt;</span>
            <span class="nt">&lt;/releases&gt;</span>
        <span class="nt">&lt;/repository&gt;</span>
    <span class="nt">&lt;/repositories&gt;</span>
</code></pre></div></div>

<p>Once that is added, we’ll now be able to access the <code class="language-plaintext highlighter-rouge">1.0.0-SNAPSHOT</code> version of Spring AI–which is reported as unstable,
but does support function calling.</p>

<p>Next, we’ll add the Spring AI dependency to our <code class="language-plaintext highlighter-rouge">pom.xml</code> file.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nt">&lt;dependency&gt;</span>
        <span class="nt">&lt;groupId&gt;</span>org.springframework.experimental<span class="nt">&lt;/groupId&gt;</span>
        <span class="nt">&lt;artifactId&gt;</span>spring-ai<span class="nt">&lt;/artifactId&gt;</span>
        <span class="nt">&lt;version&gt;</span>1.0.0-SNAPSHOT<span class="nt">&lt;/version&gt;</span>
    <span class="nt">&lt;/dependency&gt;</span>
</code></pre></div></div>

<h3 id="credentials">Credentials</h3>

<p>I’ll be using OpenAI for this project. The Spring-AI framework allows us to create an entry in our <code class="language-plaintext highlighter-rouge">application.properties</code> file:</p>

<div class="language-properties highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">spring.application.name</span><span class="p">=</span><span class="s">spring-ai-booking</span>
<span class="py">spring.ai.openai.api-key</span><span class="p">=</span><span class="s">${OPENAI_API_KEY}</span>
<span class="py">spring.ai.openai.chat.options.model</span><span class="p">=</span><span class="s">gpt-4-1106-preview</span>
</code></pre></div></div>

<p>I’m setting an environment variable <code class="language-plaintext highlighter-rouge">OPENAI_API_KEY</code> to my OpenAI API key in my Run Configuration. I also have opted to
specify the model. <code class="language-plaintext highlighter-rouge">gpt-4-1106-preview</code> is a solid choice for our project as it boasts improved function call capabilities
and a larger context window.</p>

<h3 id="sanity-check">Sanity-Check</h3>

<p>Now that our dependencies are set up, let’s do a quick sanity-check to ensure that everything is working as expected.</p>

<p>The following code snippet was pulled directly from the Spring AI documentation:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@RestController</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">SimpleAiController</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">ChatClient</span> <span class="n">chatClient</span><span class="o">;</span>

    <span class="nd">@Autowired</span>
    <span class="kd">public</span> <span class="nf">SimpleAiController</span><span class="o">(</span><span class="nc">ChatClient</span> <span class="n">chatClient</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">this</span><span class="o">.</span><span class="na">chatClient</span> <span class="o">=</span> <span class="n">chatClient</span><span class="o">;</span>
    <span class="o">}</span>

    <span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/ai/simple"</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="nf">completion</span><span class="o">(</span><span class="nd">@RequestParam</span><span class="o">(</span><span class="n">value</span> <span class="o">=</span> <span class="s">"message"</span><span class="o">,</span> <span class="n">defaultValue</span> <span class="o">=</span> <span class="s">"Tell me a joke"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">return</span> <span class="nc">Map</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"generation"</span><span class="o">,</span> <span class="n">chatClient</span><span class="o">.</span><span class="na">call</span><span class="o">(</span><span class="n">message</span><span class="o">));</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>With my application running, I want to test the endpoint by sending a <code class="language-plaintext highlighter-rouge">GET</code> request to <code class="language-plaintext highlighter-rouge">http://localhost:8080/ai/simple</code></p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/tmp ❯ curl http://localhost:8080/ai/simple
<span class="o">{</span><span class="s2">"generation"</span>:<span class="s2">"Why did the scarecrow win an award?</span><span class="se">\n</span><span class="s2">Because he was outstanding in his field!"</span><span class="o">}</span>%
</code></pre></div></div>

<p>Looks like we’re in business! Now we can move on to building our Hotel Booking Agent.</p>

<p><strong>Note:</strong> Thus far, this is all the code required. The Spring-AI framework was able to configure and autowire the
ChatClient for us using the API key we provided in the <code class="language-plaintext highlighter-rouge">application.properties</code> file. <code class="language-plaintext highlighter-rouge">LangChain4J</code> requires a bit more
configuration to get started.</p>

<h2 id="creating-the-hotel-booking-agent">Creating the Hotel Booking Agent</h2>

<p>The hotel booking agent will be a simple agent that can handle the following commands:</p>

<ul>
  <li>check availability</li>
  <li>book a room</li>
  <li>look up a booking by guest name</li>
</ul>

<p>In addition to these command capabilities, we will be using a simple in-memory data store to manage both the hotel bookings 
and the conversation context. I mentioned earlier that this exploration is more about the Spring AI framework than building a 
production-ready agent, so we will only support a single conversation at a time.</p>

<h3 id="booking-service-and-function-calls">Booking Service and Function Calls</h3>

<p>I’ve created a simple booking service that will manage the hotel bookings. While I will spare you the details of its implementation,
as it is not the focus of this post, I do want to point out some conditions I’ve set up for the service:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="cm">/**
     * Initializes the availability of rooms for specific dates for demonstration purposes.
     */</span>
    <span class="nd">@PostConstruct</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">init</span><span class="o">()</span> <span class="o">{</span>
        <span class="c1">// Set availability for January 15, 2025 (available)</span>
        <span class="nc">LocalDate</span> <span class="n">availableDate</span> <span class="o">=</span> <span class="nc">LocalDate</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="mi">2025</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">15</span><span class="o">);</span>
        <span class="n">setAvailability</span><span class="o">(</span><span class="n">availableDate</span><span class="o">,</span> <span class="mi">5</span><span class="o">);</span>

        <span class="c1">// Set availability for February 28, 2025 (unavailable)</span>
        <span class="nc">LocalDate</span> <span class="n">unavailableDate</span> <span class="o">=</span> <span class="nc">LocalDate</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="mi">2025</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">28</span><span class="o">);</span>
        <span class="n">setAvailability</span><span class="o">(</span><span class="n">unavailableDate</span><span class="o">,</span> <span class="mi">0</span><span class="o">);</span>
    <span class="o">}</span>
</code></pre></div></div>

<p>The full service can be found <a href="https://github.com/johnsosoka/code-examples/blob/main/java/spring-ai-booking/src/main/java/com/johnsosoka/springaibooking/service/HotelBookingService.java">here</a>.</p>

<p>While the service exposes methods to check availability, book a room, and cancel a booking, the LLM Agent will not be able 
to interact with the service directly. To wire up the service to the LLM Agent, we will need to define a function and 
expose it to the LLM Agent.</p>

<p>In the official Spring AI function calling documentation, they provide the following example:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MockWeatherService</span> <span class="kd">implements</span> <span class="nc">Function</span><span class="o">&lt;</span><span class="nc">Request</span><span class="o">,</span> <span class="nc">Response</span><span class="o">&gt;</span> <span class="o">{</span>

	<span class="kd">public</span> <span class="kd">enum</span> <span class="nc">Unit</span> <span class="o">{</span> <span class="no">C</span><span class="o">,</span> <span class="no">F</span> <span class="o">}</span>
	<span class="kd">public</span> <span class="kd">record</span> <span class="nf">Request</span><span class="o">(</span><span class="nc">String</span> <span class="n">location</span><span class="o">,</span> <span class="nc">Unit</span> <span class="n">unit</span><span class="o">)</span> <span class="o">{}</span>
	<span class="kd">public</span> <span class="kd">record</span> <span class="nf">Response</span><span class="o">(</span><span class="kt">double</span> <span class="n">temp</span><span class="o">,</span> <span class="nc">Unit</span> <span class="n">unit</span><span class="o">)</span> <span class="o">{}</span>

	<span class="kd">public</span> <span class="nc">Response</span> <span class="nf">apply</span><span class="o">(</span><span class="nc">Request</span> <span class="n">request</span><span class="o">)</span> <span class="o">{</span>
		<span class="k">return</span> <span class="k">new</span> <span class="nf">Response</span><span class="o">(</span><span class="mf">30.0</span><span class="o">,</span> <span class="nc">Unit</span><span class="o">.</span><span class="na">C</span><span class="o">);</span>
	<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>The above is supposed to wrap a 3rd party service. After spending some time reading the documentation and source code, I was
unable to find a way to expose multiple methods on a service without creating multiple classes. I was really hoping that
exposing a service call would be as simple as a <code class="language-plaintext highlighter-rouge">@Tool</code> annotation on a method, like in the <code class="language-plaintext highlighter-rouge">LangChain4J</code> framework.</p>

<h4 id="booking-tools">Booking Tools</h4>

<p>We will define a function that will expose a single method call on the HotelBookingService to the Agent.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Component</span>
<span class="nd">@RequiredArgsConstructor</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">CheckAvailabilityTool</span> <span class="kd">implements</span> <span class="nc">Function</span><span class="o">&lt;</span><span class="nc">CheckAvailabilityTool</span><span class="o">.</span><span class="na">Request</span><span class="o">,</span> <span class="nc">CheckAvailabilityTool</span><span class="o">.</span><span class="na">Response</span><span class="o">&gt;</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">HotelBookingService</span> <span class="n">hotelBookingService</span><span class="o">;</span>

    <span class="kd">public</span> <span class="kd">record</span> <span class="nf">Request</span><span class="o">(</span><span class="nc">String</span> <span class="n">date</span><span class="o">)</span> <span class="o">{}</span>
    <span class="kd">public</span> <span class="kd">record</span> <span class="nf">Response</span><span class="o">(</span><span class="kt">boolean</span> <span class="n">available</span><span class="o">)</span> <span class="o">{}</span>

    <span class="nd">@Override</span>
    <span class="kd">public</span> <span class="nc">Response</span> <span class="nf">apply</span><span class="o">(</span><span class="nc">Request</span> <span class="n">request</span><span class="o">)</span> <span class="o">{</span>
        <span class="c1">// LocalDate from a string</span>
        <span class="nc">LocalDate</span> <span class="n">date</span> <span class="o">=</span> <span class="nc">LocalDate</span><span class="o">.</span><span class="na">parse</span><span class="o">(</span><span class="n">request</span><span class="o">.</span><span class="na">date</span><span class="o">);</span>
        <span class="nc">Boolean</span> <span class="n">isAvailable</span> <span class="o">=</span> <span class="n">hotelBookingService</span><span class="o">.</span><span class="na">isAvailable</span><span class="o">(</span><span class="n">date</span><span class="o">);</span>

        <span class="k">return</span> <span class="k">new</span> <span class="nf">Response</span><span class="o">(</span><span class="n">isAvailable</span><span class="o">);</span>
    <span class="o">}</span>

<span class="o">}</span>
</code></pre></div></div>

<p>To test this out, I’ve modified the <code class="language-plaintext highlighter-rouge">SimpleAiController</code> to test it out, here’s how it looks now:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@RestController</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">SimpleAiController</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">ChatClient</span> <span class="n">chatClient</span><span class="o">;</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">CheckAvailabilityTool</span> <span class="n">checkAvailabilityTool</span><span class="o">;</span>

    <span class="nd">@Autowired</span>
    <span class="kd">public</span> <span class="nf">SimpleAiController</span><span class="o">(</span><span class="nc">ChatClient</span> <span class="n">chatClient</span><span class="o">,</span> <span class="nc">CheckAvailabilityTool</span> <span class="n">checkAvailabilityTool</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">this</span><span class="o">.</span><span class="na">chatClient</span> <span class="o">=</span> <span class="n">chatClient</span><span class="o">;</span>
        <span class="k">this</span><span class="o">.</span><span class="na">checkAvailabilityTool</span> <span class="o">=</span> <span class="n">checkAvailabilityTool</span><span class="o">;</span>
    <span class="o">}</span>

    <span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/ai/simple"</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="nf">completion</span><span class="o">(</span><span class="nd">@RequestParam</span><span class="o">(</span><span class="n">value</span> <span class="o">=</span> <span class="s">"message"</span><span class="o">,</span> <span class="n">defaultValue</span> <span class="o">=</span> <span class="s">"Do you have any rooms available on February 28, 2025"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>

        <span class="nc">UserMessage</span> <span class="n">userMessage</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">UserMessage</span><span class="o">(</span><span class="n">message</span><span class="o">);</span>

        <span class="kt">var</span> <span class="n">promptOptions</span> <span class="o">=</span> <span class="nc">OpenAiChatOptions</span><span class="o">.</span><span class="na">builder</span><span class="o">()</span>
                <span class="o">.</span><span class="na">withFunctionCallbacks</span><span class="o">(</span><span class="nc">List</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="nc">FunctionCallbackWrapper</span><span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="n">checkAvailabilityTool</span><span class="o">)</span>
                        <span class="o">.</span><span class="na">withName</span><span class="o">(</span><span class="s">"CheckAvailability"</span><span class="o">)</span>
                        <span class="o">.</span><span class="na">withDescription</span><span class="o">(</span><span class="s">"Check the availability of rooms for a specific date"</span><span class="o">)</span>
                        <span class="o">.</span><span class="na">withResponseConverter</span><span class="o">((</span><span class="n">response</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="s">""</span> <span class="o">+</span> <span class="n">response</span><span class="o">.</span><span class="na">available</span><span class="o">())</span>
                        <span class="o">.</span><span class="na">build</span><span class="o">()))</span>
                <span class="o">.</span><span class="na">build</span><span class="o">();</span>

        <span class="nc">ChatResponse</span> <span class="n">response</span> <span class="o">=</span> <span class="n">chatClient</span><span class="o">.</span><span class="na">call</span><span class="o">(</span><span class="k">new</span> <span class="nc">Prompt</span><span class="o">(</span><span class="nc">List</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">userMessage</span><span class="o">),</span> <span class="n">promptOptions</span><span class="o">));</span>
        <span class="k">return</span> <span class="nc">Map</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"generation"</span><span class="o">,</span> <span class="n">response</span><span class="o">.</span><span class="na">getResult</span><span class="o">().</span><span class="na">toString</span><span class="o">());</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>You can see that the ChatClient/LLM is now aware of the <code class="language-plaintext highlighter-rouge">CheckAvailabilityTool</code> function. I’ve also updated the default
message value to include a date that is set to be unavailable in the <code class="language-plaintext highlighter-rouge">HotelBookingService</code>.</p>

<p>Let’s test it out:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/tmp ❯ curl http://localhost:8080/ai/simple
<span class="o">{</span><span class="s2">"generation"</span>:<span class="s2">"Generation{assistantMessage=AssistantMessage{content='I'm sorry, but we do not have any rooms available on February 28, 2025.', properties={role=ASSISTANT, finishReason=STOP, id=chatcmpl-9658XUpnLnRKZ2LUlbk4DbF00WBBV}, messageType=ASSISTANT}, chatGenerationMetadata=org.springframework.ai.chat.metadata.ChatGenerationMetadata</span><span class="nv">$1</span><span class="s2">@642222bf}"</span><span class="o">}</span>
</code></pre></div></div>

<p>Perfect! The LLM was able to call the <code class="language-plaintext highlighter-rouge">CheckAvailabilityTool</code> function and respond accordingly. I will continue to build
wrapper functions for the <code class="language-plaintext highlighter-rouge">BookRoom</code> and <code class="language-plaintext highlighter-rouge">CancelBooking</code> methods in the <code class="language-plaintext highlighter-rouge">HotelBookingService</code>.</p>

<h4 id="chat-management">Chat Management</h4>

<p>Spring AI doesn’t appear to provide a mechanism OOTB for managing the conversation context. I will be using a simple in-memory
store using another singleton scoped spring service to manage <em>a single</em> conversation.</p>

<p><strong>Note:</strong> With LLM APIs, they generally do not manage any of the conversation context. This is typically handled by the client,
which will save all relevant messages &amp; provide them upon each request for the model to generate a response.</p>

<p>Below is the simple ConversationService I’ve created:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Service</span>
<span class="nd">@Scope</span><span class="o">(</span><span class="s">"singleton"</span><span class="o">)</span>
<span class="nd">@Slf4j</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">ConversationService</span> <span class="o">{</span>
    <span class="kd">private</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Message</span><span class="o">&gt;</span> <span class="n">messageList</span> <span class="o">=</span> <span class="nc">Collections</span><span class="o">.</span><span class="na">synchronizedList</span><span class="o">(</span><span class="k">new</span> <span class="nc">ArrayList</span><span class="o">&lt;&gt;());</span>

    <span class="cm">/**
     * Adds a message to the conversation.
     *
     * @param message the message to be added to the conversation
     * @return the updated list of messages in the conversation
     */</span>
    <span class="kd">public</span> <span class="kd">synchronized</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Message</span><span class="o">&gt;</span> <span class="nf">addMessage</span><span class="o">(</span><span class="nc">Message</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">messageList</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">message</span><span class="o">);</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Added message to conversation: {}, total messages: {}"</span><span class="o">,</span> <span class="n">message</span><span class="o">,</span> <span class="n">messageList</span><span class="o">.</span><span class="na">size</span><span class="o">());</span>
        <span class="k">return</span> <span class="n">messageList</span><span class="o">;</span>
    <span class="o">}</span>

    <span class="cm">/**
     * Retrieves all messages in the conversation.
     *
     * @return the list of messages in the conversation
     */</span>
    <span class="kd">public</span> <span class="kd">synchronized</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Message</span><span class="o">&gt;</span> <span class="nf">getAllMessages</span><span class="o">()</span> <span class="o">{</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Retrieved all {} messages"</span><span class="o">,</span> <span class="n">messageList</span><span class="o">.</span><span class="na">size</span><span class="o">());</span>
        <span class="k">return</span> <span class="n">messageList</span><span class="o">;</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>In a real world application, this would be replaced with a more robust solution, such as Redis or a database–and would
support concurrent conversations. As this is written, it will only save the context of our current conversation.</p>

<h4 id="the-agent">The Agent</h4>

<p>Now that we have the services we will require, as well as methods on those services exposed as tools, we can build
the Agent. The Agent will need to utilize the <code class="language-plaintext highlighter-rouge">ConversationService</code> to manage the conversation context, and will need to
be aware of the <code class="language-plaintext highlighter-rouge">CheckAvailabilityTool</code>, <code class="language-plaintext highlighter-rouge">BookRoomTool</code>, and <code class="language-plaintext highlighter-rouge">CancelBookingTool</code> capabilities to fulfill its mission as a 
hotel booking agent.</p>

<p>Review the code below:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Component</span>
<span class="nd">@RequiredArgsConstructor</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">BookingAgent</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">ChatClient</span> <span class="n">chatClient</span><span class="o">;</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">CheckAvailabilityTool</span> <span class="n">checkAvailabilityTool</span><span class="o">;</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">FindBookingTool</span> <span class="n">findBookingTool</span><span class="o">;</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">BookRoomTool</span> <span class="n">bookRoomTool</span><span class="o">;</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">ConversationService</span> <span class="n">conversationService</span><span class="o">;</span>

    <span class="cm">/**
     * When the BookingAgent is created, we will define the agent's role as a SystemMessage at the top of the conversation.
     */</span>
    <span class="nd">@PostConstruct</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">defineAgentProfile</span><span class="o">()</span> <span class="o">{</span>
        <span class="nc">String</span> <span class="n">agentProfile</span> <span class="o">=</span> <span class="s">"You are a booking agent for an online hotel. You are here to help customers book rooms and check availability."</span> <span class="o">+</span>
                <span class="s">"Use the tools you have access to in order to help customers with their requests. You can check availability, book rooms, and find bookings."</span><span class="o">;</span>
        <span class="nc">SystemMessage</span> <span class="n">systemMessage</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SystemMessage</span><span class="o">(</span><span class="n">agentProfile</span><span class="o">);</span>
        <span class="n">conversationService</span><span class="o">.</span><span class="na">addMessage</span><span class="o">(</span><span class="n">systemMessage</span><span class="o">);</span>
    <span class="o">}</span>


    <span class="cm">/**
     * When a message is sent to the agent, the agent will handle the message and return a response.
     * @param message
     * @return
     */</span>
    <span class="kd">public</span> <span class="nc">String</span> <span class="nf">handleMessage</span><span class="o">(</span><span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>
        <span class="c1">// Add the user message to the conversation</span>
        <span class="nc">UserMessage</span> <span class="n">latestMessage</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">UserMessage</span><span class="o">(</span><span class="n">message</span><span class="o">);</span>
        <span class="n">conversationService</span><span class="o">.</span><span class="na">addMessage</span><span class="o">(</span><span class="n">latestMessage</span><span class="o">);</span>

        <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Message</span><span class="o">&gt;</span> <span class="n">messages</span> <span class="o">=</span> <span class="n">conversationService</span><span class="o">.</span><span class="na">getAllMessages</span><span class="o">();</span>
        <span class="kt">var</span> <span class="n">promptOptions</span> <span class="o">=</span> <span class="n">getPromptOptions</span><span class="o">();</span>

        <span class="nc">ChatResponse</span> <span class="n">response</span> <span class="o">=</span> <span class="n">chatClient</span><span class="o">.</span><span class="na">call</span><span class="o">(</span><span class="k">new</span> <span class="nc">Prompt</span><span class="o">(</span><span class="n">messages</span><span class="o">,</span> <span class="n">promptOptions</span><span class="o">));</span>
        <span class="c1">// Add the assistant response to the conversation</span>
        <span class="n">conversationService</span><span class="o">.</span><span class="na">addMessage</span><span class="o">(</span><span class="n">response</span><span class="o">.</span><span class="na">getResult</span><span class="o">().</span><span class="na">getOutput</span><span class="o">());</span>

        <span class="c1">// Return the assistant response</span>
        <span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="na">getResult</span><span class="o">().</span><span class="na">getOutput</span><span class="o">().</span><span class="na">getContent</span><span class="o">();</span>

    <span class="o">}</span>

    <span class="cm">/**
     * Expose function callbacks to the OpenAI chat client
     *
     * @return
     */</span>
    <span class="kd">private</span> <span class="nc">OpenAiChatOptions</span> <span class="nf">getPromptOptions</span><span class="o">()</span> <span class="o">{</span>
        <span class="k">return</span> <span class="nc">OpenAiChatOptions</span><span class="o">.</span><span class="na">builder</span><span class="o">()</span>
                <span class="o">.</span><span class="na">withFunctionCallbacks</span><span class="o">(</span><span class="nc">List</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="nc">FunctionCallbackWrapper</span><span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="n">checkAvailabilityTool</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withName</span><span class="o">(</span><span class="s">"CheckAvailability"</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withDescription</span><span class="o">(</span><span class="s">"Helpful for checking the availability of rooms for a specific date, this should be used before booking a room for a new guest."</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withResponseConverter</span><span class="o">((</span><span class="n">response</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="s">""</span> <span class="o">+</span> <span class="n">response</span><span class="o">.</span><span class="na">available</span><span class="o">())</span>
                                <span class="o">.</span><span class="na">build</span><span class="o">(),</span>
                        <span class="nc">FunctionCallbackWrapper</span><span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="n">bookRoomTool</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withName</span><span class="o">(</span><span class="s">"BookRoom"</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withDescription</span><span class="o">(</span><span class="s">"Helpful for booking a room for a new guest for a specific check-in and check-out date"</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withResponseConverter</span><span class="o">((</span><span class="n">response</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">response</span><span class="o">.</span><span class="na">bookingStatus</span><span class="o">())</span>
                                <span class="o">.</span><span class="na">build</span><span class="o">(),</span>
                        <span class="nc">FunctionCallbackWrapper</span><span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="n">findBookingTool</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withName</span><span class="o">(</span><span class="s">"FindBooking"</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withDescription</span><span class="o">(</span><span class="s">"Helpful to determine if an existing guest has booked a room"</span><span class="o">)</span>
                                <span class="o">.</span><span class="na">withResponseConverter</span><span class="o">((</span><span class="n">response</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">response</span><span class="o">.</span><span class="na">booking</span><span class="o">())</span>
                                <span class="o">.</span><span class="na">build</span><span class="o">()))</span>
                <span class="o">.</span><span class="na">build</span><span class="o">();</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">defineAgentProfile</code> method is used to set the stage for the agent. This would be referred to as “The Profiling Module”
in <a href="https://arxiv.org/pdf/2308.11432.pdf">Unified Framework for LLM Based Agents</a>. When the agent is created a <code class="language-plaintext highlighter-rouge">SystemMessage</code>
explaining the agent’s role is added to the top of the conversation.</p>

<p>When <code class="language-plaintext highlighter-rouge">ChatResponse response = chatClient.call(new Prompt(messages, promptOptions));</code> is called, the Agent is provided
the entire context (view the <code class="language-plaintext highlighter-rouge">messages</code> array being passed to a new prompt) as well as the <code class="language-plaintext highlighter-rouge">promptOptions</code> which include
all of the function callbacks that the Agent is has access to.</p>

<p><strong>Note:</strong> Spring provides multiple ways to expose tools to the Agent. Including <code class="language-plaintext highlighter-rouge">@Bean</code> annotations in a configuration class.
I opted to use the <code class="language-plaintext highlighter-rouge">@Component</code> annotation to keep everything in one place so that readers could see the entire Agent in one
place. You can read about other ways to expose tools in the <a href="https://docs.spring.io/spring-ai/reference/1.0-SNAPSHOT/api/chat/functions/openai-chat-functions.html#_registering_functions_as_beans">Spring AI documentation</a>.</p>

<h4 id="testing-the-agent">Testing the Agent</h4>

<p>In order to test the agent, I created a simple <code class="language-plaintext highlighter-rouge">BookingAgentTest</code> class which will invoke it with a few different messages.
Here are the contents of that test, then we’ll walk through the output:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@SpringBootTest</span>
<span class="kd">class</span> <span class="nc">BookingAgentTest</span> <span class="o">{</span>

    <span class="nd">@Autowired</span>
    <span class="kd">private</span> <span class="nc">BookingAgent</span> <span class="n">bookingAgent</span><span class="o">;</span>

    <span class="nd">@Test</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">testBookingConversation</span><span class="o">()</span> <span class="o">{</span>
        <span class="nc">String</span> <span class="n">firstMessage</span> <span class="o">=</span> <span class="s">"Hi, my name is John--Can you see if any rooms are available on February 28, 2025?"</span><span class="o">;</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">bookingAgent</span><span class="o">.</span><span class="na">handleMessage</span><span class="o">(</span><span class="n">firstMessage</span><span class="o">));</span>
        <span class="nc">String</span> <span class="n">availability</span> <span class="o">=</span> <span class="s">"Do you have any availability on January 15th, 2025?"</span><span class="o">;</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">bookingAgent</span><span class="o">.</span><span class="na">handleMessage</span><span class="o">(</span><span class="n">availability</span><span class="o">));</span>
        <span class="c1">// Start a new conversation</span>
        <span class="nc">String</span> <span class="n">alternativeDate</span> <span class="o">=</span> <span class="s">"Please book 1 room for John on January 15h, 2025. The check-out date will be January 21st, 2025."</span><span class="o">;</span>
        <span class="c1">// Expect a successful booking</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">bookingAgent</span><span class="o">.</span><span class="na">handleMessage</span><span class="o">(</span><span class="n">alternativeDate</span><span class="o">));</span>
        <span class="nc">String</span> <span class="n">checkBooking</span> <span class="o">=</span> <span class="s">"Can you see if a guest John has reserved any rooms?"</span><span class="o">;</span>
        <span class="c1">// Expect a yes</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">bookingAgent</span><span class="o">.</span><span class="na">handleMessage</span><span class="o">(</span><span class="n">checkBooking</span><span class="o">));</span>

        <span class="c1">// Demonstrate persisted conversation context</span>
        <span class="nc">String</span> <span class="n">summarize</span> <span class="o">=</span> <span class="s">"Can you summarize our discussion today?"</span><span class="o">;</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">bookingAgent</span><span class="o">.</span><span class="na">handleMessage</span><span class="o">(</span><span class="n">summarize</span><span class="o">));</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>In the above, we can see that we set up a few different scenarios for the agent to handle.</p>
<ol>
  <li>Check on a date that is unavailable - The Agent should look up the availability and respond that there are no rooms available.</li>
  <li>Check on a date that is available - The Agent should look up the availability and respond that there are rooms available.</li>
  <li>Book a room - The Agent should book a room for John on the specified date range.</li>
  <li>Check if a booking exists - The Agent should respond that a booking exists for John.</li>
  <li>Summarize the conversation - The Agent should summarize the conversation.</li>
</ol>

<p>The final point to summarize the conversation is really just to illustrate that the <code class="language-plaintext highlighter-rouge">ConversationService</code> is correctly
managing the conversation. It should be housing all messages from both Agent &amp; User and enable to Agent to provide
a complete summary of the conversation.</p>

<p>Here are some selected outputs from the test (I’ve added the comments over log statements to provide context):</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 1st User message</span>
Added message to conversation: UserMessage<span class="o">{</span><span class="nv">content</span><span class="o">=</span><span class="s1">'Hi, my name is John--Can you see if any rooms are available on February 28, 2025?'</span>, <span class="nv">properties</span><span class="o">={}</span>, <span class="nv">messageType</span><span class="o">=</span>USER<span class="o">}</span>, total messages: 2
<span class="c"># Agent response indicating Feb 28, 2025 is unavailable (as expected)</span>
Added message to conversation: AssistantMessage<span class="o">{</span><span class="nv">content</span><span class="o">=</span><span class="s1">'I'</span>m sorry, John, but it looks like there are no rooms available on February 28, 2025. If you have any flexibility with your dates or need assistance with anything <span class="k">else</span>, please <span class="nb">let </span>me know, and I<span class="s1">'ll be happy to help.'</span>, <span class="nv">properties</span><span class="o">={</span><span class="nv">role</span><span class="o">=</span>ASSISTANT, <span class="nv">finishReason</span><span class="o">=</span>STOP, <span class="nb">id</span><span class="o">=</span>chatcmpl-96K38H75d6hEhDErQtBf3UiT5VZRC<span class="o">}</span>, <span class="nv">messageType</span><span class="o">=</span>ASSISTANT<span class="o">}</span>, total messages: 3
I<span class="s1">'m sorry, John, but it looks like there are no rooms available on February 28, 2025. If you have any flexibility with your dates or need assistance with anything else, please let me know, and I'</span>ll be happy to help.
<span class="c"># User suggests a different date</span>
Added message to conversation: UserMessage<span class="o">{</span><span class="nv">content</span><span class="o">=</span><span class="s1">'Do you have any availability on January 15th, 2025?'</span>, <span class="nv">properties</span><span class="o">={}</span>, <span class="nv">messageType</span><span class="o">=</span>USER<span class="o">}</span>, total messages: 4
<span class="c"># Agent response indicating Jan 15, 2025 is available (as expected)</span>
Added message to conversation: AssistantMessage<span class="o">{</span><span class="nv">content</span><span class="o">=</span><span class="s1">'Great news, John! We do have rooms available on January 15th, 2025. If you'</span>d like to proceed with booking a room, please provide me with the details, such as the number of nights you<span class="s1">'d like to stay and any specific room preferences you may have.'</span>, <span class="nv">properties</span><span class="o">={</span><span class="nv">role</span><span class="o">=</span>ASSISTANT, <span class="nv">finishReason</span><span class="o">=</span>STOP, <span class="nb">id</span><span class="o">=</span>chatcmpl-96K3CelXXwBTlN0SJrBDFBDS9HmcD<span class="o">}</span>, <span class="nv">messageType</span><span class="o">=</span>ASSISTANT<span class="o">}</span>, total messages: 5
<span class="c"># User requests a booking</span>
Added message to conversation: UserMessage<span class="o">{</span><span class="nv">content</span><span class="o">=</span><span class="s1">'Please book 1 room for John on January 15h, 2025. The check-out date will be January 21st, 2025.'</span>, <span class="nv">properties</span><span class="o">={}</span>, <span class="nv">messageType</span><span class="o">=</span>USER<span class="o">}</span>, total messages: 6
<span class="c"># Agent response indicating the booking was successful</span>
Added message to conversation: AssistantMessage<span class="o">{</span><span class="nv">content</span><span class="o">=</span><span class="s1">'Your room has been successfully booked, John! You'</span>ll be staying from January 15th to January 21st, 2025. If you need any further assistance or confirmation details, please don<span class="s1">'t hesitate to ask. Enjoy your stay!'</span>, <span class="nv">properties</span><span class="o">={</span><span class="nv">role</span><span class="o">=</span>ASSISTANT, <span class="nv">finishReason</span><span class="o">=</span>STOP, <span class="nb">id</span><span class="o">=</span>chatcmpl-96K3HwfCdXdiUy7iRvrjo4NKHrWn7<span class="o">}</span>, <span class="nv">messageType</span><span class="o">=</span>ASSISTANT<span class="o">}</span>, total messages: 7
<span class="c"># User checks if a booking exists (Just to demonstrate the FindBookingTool)</span>
Added message to conversation: AssistantMessage<span class="o">{</span><span class="nv">content</span><span class="o">=</span><span class="s1">'Yes, John, you have a reservation with us. Here are the details of your booking:

- Check-in Date: January 15, 2025
- Check-out Date: January 21, 2025

If you need any further assistance regarding your booking or if there'</span>s anything <span class="k">else </span>I can <span class="nb">help </span>you with, please <span class="nb">let </span>me know.<span class="s1">', properties={role=ASSISTANT, finishReason=STOP, id=chatcmpl-96K3LfNZJsHYfXCOsfMT6HfTOFn0r}, messageType=ASSISTANT}, total messages: 9
</span></code></pre></div></div>

<p>We can see that the Agent is able to handle the different scenarios and provide the appropriate responses. The Agent is
invoking tools appropriately, tracking the conversation &amp; relaying accurate information. I’m happy with the results!</p>

<p>To button things down, I’ve adjusted the <code class="language-plaintext highlighter-rouge">SimpleAIController</code> to use the <code class="language-plaintext highlighter-rouge">BookingAgent</code> which will enable us to interact
with the Agent via a simple REST API.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This has been a fun exploration of the Spring AI framework. I was able to build a simple Hotel Booking Agent that was able
to handle a few different scenarios. As mentioned at the top of the article, Function Calling support is not available in
the latest stable release. Compared to the <code class="language-plaintext highlighter-rouge">LangChain4J</code> framework, Spring AI is a bit more complex to get started with,
and provides fewer out-of-the-box features. Furthermore, it appears that in its current form Function Calling isn’t 
very portable between models, but this will require further exploration.</p>

<p>One other quip I have with the Spring AI Framework is that we do not get as much insight or control into the function calling
under the hood. With <code class="language-plaintext highlighter-rouge">LangChain4J</code> we can stash the function requests/responses in a chat <code class="language-plaintext highlighter-rouge">MemoryStore</code> which would
enable agents to track their tool executions over a long form conversation. This doesn’t seem easily possible with the
current SNAPSHOT version of Spring AI. The common example that I use for this problem is a bot being exposed to a tool
that requires pagination. If the bot cannot track the pagination state (by seeing what it invoked awhile ago), it will
get stuck in a loop requesting the first page.</p>

<p>In all, I’m excited to see where the Spring AI project goes. I think it has a lot of potential, and I’m looking forward to
seeing how it evolves. The Spring team &amp; community has a great track record of building robust, developer-friendly tools, 
and I’m excited to see what the future holds for Spring AI.</p>

<p>The full code for today’s project as well as a Postman collection to interact with the Agent can be found 
<a href="https://github.com/johnsosoka/code-examples/tree/main/java/spring-ai-booking">Here</a>.</p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="Spring" /><category term="AI" /><category term="LLM" /><category term="Agent" /><category term="java" /><category term="AI" /><category term="function" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">The Dawn of Semi-Autonomous E-Commerce Agents</title><link href="https://www.johnsosoka.com/blog/2024/03/06/Dawn-Commerce-Agents.html" rel="alternate" type="text/html" title="The Dawn of Semi-Autonomous E-Commerce Agents" /><published>2024-03-06T00:00:00-07:00</published><updated>2024-03-06T00:00:00-07:00</updated><id>https://www.johnsosoka.com/blog/2024/03/06/Dawn-Commerce-Agents</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2024/03/06/Dawn-Commerce-Agents.html"><![CDATA[<p>I recently published an article, <a href="https://www.commerce-architects.com/post/the-dawn-of-semi-autonomous-e-commerce-agents">The Dawn of Semi-Autonomous E-Commerce Agents</a> for Commerce Architects,
in which I explain what an Agent is, how they might be used in commerce, and how they might be implemented. 
The article is a high-level overview over Agents and how they might be used in commerce. It is not a technical deep dive.</p>

<p>Read the full article <a href="https://www.commerce-architects.com/post/the-dawn-of-semi-autonomous-e-commerce-agents">here.</a></p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="LLM" /><category term="Agent" /><category term="unified" /><category term="framework" /><category term="components" /><category term="autonomous" /><category term="commerce" /><category term="planning" /><category term="execution" /><category term="memory" /><category term="profiling" /><category term="design" /><summary type="html"><![CDATA[I recently published an article, The Dawn of Semi-Autonomous E-Commerce Agents for Commerce Architects, in which I explain what an Agent is, how they might be used in commerce, and how they might be implemented. The article is a high-level overview over Agents and how they might be used in commerce. It is not a technical deep dive.]]></summary></entry><entry><title type="html">Extending the Memory of Large Language Models</title><link href="https://www.johnsosoka.com/blog/2024/02/19/extend-llm-memory.html" rel="alternate" type="text/html" title="Extending the Memory of Large Language Models" /><published>2024-02-19T00:00:00-07:00</published><updated>2024-02-19T00:00:00-07:00</updated><id>https://www.johnsosoka.com/blog/2024/02/19/extend-llm-memory</id><content type="html" xml:base="https://www.johnsosoka.com/blog/2024/02/19/extend-llm-memory.html"><![CDATA[<p><img src="https://media.johnsosoka.com/blog/2024-02-19/aimemart.png" alt="headimg" /></p>

<p>With the recent OpenAI announcement adding memory to chatGPT, it seemed like a great time to write about <em>how</em> to add 
memory to large language models (LLMs). While the OpenAI announcement is new, giving LLMs persistent memory is not.</p>

<p>We will begin by covering the fundamental concepts of memory in LLMs, and then continue discussing some high-level strategies.
This post will not include code examples and will be focused on how to conceptualize memory management for LLMs.</p>

<p>It is important to remember that this is a new and rapidly evolving field. The strategies discussed here are not exhaustive. 
Ultimately, the design of your LLM memory will depend on the specific requirements of your project. This article will set
the stage for different types of Request Augmented Retrieval (RAG) strategies to be implemented in the context of extending
the dynamic memory of LLMs.</p>

<p><strong>Note:</strong> While LLMs are not limited to chatbots, I will be describing the memory strategies in the context of a chatbot
to keep things simple.</p>

<h2 id="conceptualizing-in-context-memory">Conceptualizing In-Context Memory</h2>

<p>At this time, LLMs are limited to a fixed-size context window. This means that there is a limit to the amount of information
that a model can process at a given time (measured in Tokens). With most LLM providers, the clients will maintain the context
and send the existing context along with the latest input to the model.</p>

<p>This can be a long string, but it’s generally easier to envision the context window as an Array of messages between the model
and the user.</p>

<p><img src="https://media.johnsosoka.com/blog/2024-02-19/contextwindow.png" alt="context window" /></p>

<p><em>Figure 1: Diagram of a context window</em></p>

<p>In the above diagram, the context window is represented as an array of messages. The most recent message is at the end of 
the array. The messages are ordered from oldest to newest. The “System Message” is almost always prevented from being
removed from the context window as it will contain critical information to the model–typically profiling instructions that
inform the LLM of its roll &amp; desired behavior.</p>

<p>The Context Window memory is the “short term” memory of the LLM. It will typically be managed in memory on the application
or on something like Redis.</p>

<h3 id="sliding-windows">Sliding Windows</h3>

<p>A “sliding window” is a common strategy for managing the context window. This is where the context windows is maintained
as a fixed size of messages or tokens. When the context window is full, the oldest message is removed to make room for the
newest message.</p>

<p>This strategy is straightforward, but eventually results in the loss of the oldest context messages unless a mechanism is
in place to persist them.</p>

<h4 id="reserved-indices">Reserved Indices</h4>

<p>Before we can attach a persistent memory store to the LLM we will first need a place to put the data. As mentioned before,
the context-window for Language Models is limited, so we will need to use a location within the context window to insert
information fetched from a persistent memory store. If the application can identify relevant information from the persistent
memory store, it can insert it into the context window at a reserved index for the model to use.</p>

<p><img src="https://media.johnsosoka.com/blog/2024-02-19/context-window-reserved.png" alt="reserved indices" /></p>

<p><em>Figure 2: Reserved Indices in the Context Window</em></p>

<p>In the above diagram, we can see that the context window has been adjusted to include reserved indices. This effectively
reduces the size of the context window for unabridged, unreserved messages–but it does provide a placeholder for messages
that are retrieved from a persistent memory store.</p>

<p><strong>Note:</strong> Some retrieval strategies will simply append the injected context messages at the end of the context window. I
have had success with both strategies, but I personally prefer more granular control over the context window.</p>

<h4 id="persisting-and-retrieving-context">Persisting and Retrieving Context</h4>

<p>When we write code to manage the context window, we can create mechanisms to persist &amp; retrieve data that has been removed
from the context window.</p>

<div style="text-align: center;">
    <img src="https://media.johnsosoka.com/blog/2024-02-19/memstore-persist.png" alt="persisting context" style="width: 80%; height: auto;" />
</div>
<p><br />
<em>Figure 3: Persisting Context</em></p>

<p>Above, we can see the high-level approach for an application to balance the management of both “short-term” and “long-term” 
memory. As the context window fills up, the application can save the oldest messages to a persistent memory store.</p>

<p><strong>Note:</strong> As mentioned at the top of this post, the implementation details depend on your use case. The persisted messages could
be summaries, they could be converted to vector embeddings, they could be entries in a SQL database or Graph database, etc.</p>

<div style="text-align: center;">
    <img src="https://media.johnsosoka.com/blog/2024-02-19/memstore-retrieve.png" alt="retrieving context" style="width: 70%; height: auto;" />
</div>
<p><br />
<em>Figure 4: Retrieving Context</em></p>

<p>Figure 4 demonstrates the flip-side of the equation. When new messages are received, the application can query the persistent
memory store for relevant messages and insert them into the context window at the reserved indices.</p>

<p>We finally made it to Retrieval Augmented Generation, or <strong>RAG</strong>! Above is the high-level retrieval mechanism. The existing context
and new message <em>can both</em> be utilized to retrieve relevant information from the persistent memory store. The retrieved 
information is then injected into the context, giving the model access to a larger pool of information than it would have
had otherwise.</p>

<h3 id="going-a-little-deeper">Going (a little) Deeper</h3>

<p>Now that we have a high-level understanding of memory management for LLMs, we can <em>briefly</em> touch on some more specific strategies
in details.</p>

<p><strong>Note:</strong> As a pattern, there is nothing stopping us from using an LLM to manage the persistence &amp; retrieval of data. That is
to say we can have one instance of an LLM that is responsible for managing the context window and another instance of an LLM
that is responsible for having a conversation. I like to think of this as a “subconscious.” That is, a reasoning layer that
helps the LLM make decisions about what to remember and what to forget.</p>

<ul>
  <li><strong>Graph Databases</strong>: Graph databases can be enormously powerful, particularly if the use case of your LLM is to manage
relationships between entities or concepts. As messages “slide” out of the context window, they can be stored in a graph
for the LLM to query later.</li>
  <li><strong>Vector Embeddings</strong>: Vector embeddings are a natural fit for LLMs. The messages that are removed from the context window
can be converted to vector embeddings. Messages that slide out of the context window are then converted into vector embeddings,
stored in a vector database &amp; then retrieved based on semantic relevance to user queries.</li>
  <li><strong>SQL Databases</strong>: SQL databases are a great candidate for storing unabridged messages that have been removed from the
context window. Code can be written to fetch N number of messages from the persistent data store. An Agent can then
evaluate those messages before being inserting into the context window (perhaps the information isn’t useful, or it needs
to be summarized).</li>
</ul>

<p>LLM applications with memory can be designed in a great many ways. If you’re building a autonomous agent instead of 
a chatbot, the design might be different. Instead of summarizing and retrieving conversation details, it instead might be
saving lessons learned instead; For example, if a tool execution fails to align with a plan, the agent can save this information 
as a lesson to be retrieved before the next planning &amp; execution cycle.</p>

<h3 id="conclusion">Conclusion</h3>

<p>Today we covered the high-level concepts of adding memory to LLMs. We discussed how to conceptualize the context window,
sliding windows, reserved indices &amp; request augmented retrieval. We also briefly touched on some more specific strategies
and how they might be implemented.</p>

<p>I hope that this article has provided a high-level understanding on how to add memory to LLMs. In future posts, I will
be taking a deeper dive into implementing some of these strategies. Feel free to reach out on <a href="https://www.linkedin.com/in/john-sosoka-4528b970/">linkedin</a>
the <a href="/contact/">contact page</a> if you have any questions or comments.</p>]]></content><author><name>John Sosoka</name><email>web@johnsosoka.com</email></author><category term="blog" /><category term="LLM" /><category term="java" /><category term="memory" /><category term="context" /><category term="management" /><category term="RAG" /><category term="retrieval" /><category term="augmented" /><category term="generation" /><category term="chatbot" /><category term="persistent" /><category term="memory" /><category term="data" /><category term="management" /><category term="graph" /><category term="sql" /><category term="node" /><category term="redis" /><summary type="html"><![CDATA[]]></summary></entry></feed>