Skip to main content

New top story on Hacker News: Show HN: Langfuse – Open-source observability and analytics for LLM apps

Show HN: Langfuse – Open-source observability and analytics for LLM apps
32 by marcklingen | 3 comments on Hacker News.
Hi HN! Langfuse is OSS observability and analytics for LLM applications (repo: https://ift.tt/3axGY9U , 2 min demo: https://ift.tt/BPxy9GL , try it yourself: https://ift.tt/3LSZFgK ) Langfuse makes capturing and viewing LLM calls (execution traces) a breeze. On top of this data, you can analyze the quality, cost and latency of LLM apps. When GPT-4 dropped, we started building LLM apps – a lot of them! [1, 2] But they all suffered from the same issue: it’s hard to assure quality in 100% of cases and even to have a clear view of user behavior. Initially, we logged all prompts/completions to our production database to understand what works and what doesn’t. We soon realized we needed more context, more data and better analytics to sustainably improve our apps. So we started building a homegrown tool. Our first task was to track and view what is going on in production: what user input is provided, how prompt templates or vector db requests work, and which steps of an LLM chain fail. We built async SDKs and a slick frontend to render chains in a nested way. It’s a good way to look at LLM logic ‘natively’. Then we added some basic analytics to understand token usage and quality over time for the entire project or single users (pre-built dashboards). Under the hood, we use the T3 stack (Typescript, NextJs, Prisma, tRPC, Tailwind, NextAuth), which allows us to move fast + it means it's easy to contribute to our repo. The SDKs are heavily influenced by the design of the PostHog SDKs [3] for stable implementations of async network requests. It was a surprisingly inconvenient experience to convert OpenAPI specs to boilerplate Python code and we ended up using Fern [4] here. We’re fans of Tailwind + shadcn/ui + tremor.so for speed and flexibility in building tables and dashboards fast. Our SDKs run fully asynchronously and make network requests in the background. We did our best to reduce any impact on application performance to a minimum. We never block the main execution path. We've made two engineering decisions we've felt uncertain about: to use a Postgres database and Looker Studio for the analytics MVP. Supabase performs well at our scale and integrates seamlessly into our tech stack. We will need to move to an OLAP database soon and are debating if we need to start batching ingestion and if we can keep using Vercel. Any experience you could share would be helpful! Integrating Looker Studio got us to first analytics charts in half a day. As it is not open-source and does not work with our UI/UX, we are looking to switch it out for an OSS solution to flexibly generate charts and dashboards. We’ve had a look at Lightdash and would be happy to hear your thoughts. We’re borrowing our OSS business model from Posthog/Supabase who make it easy to self-host with features reserved for enterprise (no plans yet) and a paid version for managed cloud service. Right now all of our code is available under a permissive license (MIT). Next, we’re going deep on analytics. For quality specifically, we will build out model-based evaluations and labeling to be able to cluster traces by scores and use cases. Looking forward to hearing your thoughts and discussion – we’ll be in the comments. Thanks! [1] https://ift.tt/E7zCYS8 [2] https://ift.tt/1cwVhfJ [3] https://ift.tt/iLGdArg [4] https://ift.tt/vAmWTb2

Comments

Popular posts from this blog

New top story on Hacker News: Tell HN: I think I found Toyota's battery

Tell HN: I think I found Toyota's battery 173 by scythe | 29 comments on Hacker News. Recently there was a thread about a "breakthrough" in battery technology at Toyota. https://ift.tt/nUtv4yY Toyota has been putting out PR puff pieces about their "solid-state" (solid-electrolyte) batteries for years, but this story was unique in that it had a quote from Keiji Kaita, who holds some high-level role at Toyota. Anyway, I didn't think much of it, because there was no paper referenced in the Guardian article, which seemed to be the original source. But while reading about something else, I came across the paper "A near dimensionally invariable high-capacity positive electrode material", published in Nature Materials last December: https://ift.tt/24ZXPy5 This paper, reporting a cathode that has very little (much less than normal) change in size or shape when charged and discharged, claims reversible storage with a solid electrolyte. It stands to reaso...

New top story on Hacker News: Show HN: Neucards – Privacy based digital contact card

Show HN: Neucards – Privacy based digital contact card 7 by bdominy | 1 comments on Hacker News. Neucards is an end-to-end encrypted contact information sharing and updating iOS app that protects your identity while letting you keep in touch with people. I started working on neucards as a side project more than ten years ago, and I decided three years ago to go full-time and try to build a community around it. There are two major problems that neucards addresses. First, most people end up with contact lists that are hopelessly out of date. Over time, people move, change jobs, or add social profiles and unless they tell you, chances are you could lose touch. Second, your contact information ends up in the wrong hands. There has been a huge increase in robocalls, unsolicited emails, data breaches, and online scams that is driven by accessing a person's contact info. Even worse, with AI now being able to imitate a person's voice or other mannerisms, knowledge about the connecti...