#299 Personal search engine with datasette and dogsheep
Play • 1 hr 2 min
In this episode, we'll be discussing two powerful tools for data reporting and exploration: Datasette and Dogsheep.

Datasette helps people take data of any shape or size, analyze and explore it, and publish it as an interactive website and accompanying API.

Dogsheep is a collection of tools for personal analytics using SQLite and Datasette. Imagine a unified search engine for everything personal in your life such as twitter, photos, google docs, todoist, goodreads, and more, all in once place and outside of cloud companies.

On this episode we talk with Simon Willison who created both of these projects. He's also one of the co-creators of Django and we'll discuss some early Django history!

Links from the show

Datasette: datasette.io
Dogsheep: dogsheep.github.io
Datasheet newsletter: datasette.substack.com
Video: Build your own data warehouse for personal analytics with SQLite and Datasette: youtube.com

Examples
List: github.com
Personal data warehouses: github.com
Global power plants: datasettes.com
SF data: datasettes.com
FiveThirtyEight: fivethirtyeight.datasettes.com
Lahman’s Baseball Database: baseballdb.lawlesst.net
Live demo of current main: datasette.io

Sponsors

Linode
Talk Python Training
Python Bytes
Python Bytes
Michael Kennedy and Brian Okken
#222 Autocomplete with type annotations for AWS and boto3
Sponsored by Linode! pythonbytes.fm/linode Special guest: Greg Herrera YouTube live stream for viewers: Watch on YouTube Michael #1: boto type annotations * via Michael Lerner * boto3's services are created at runtime * IDEs aren't able to index its code in order to provide code completion or infer the type of these services or of the objects created by them. * Type systems cannot verify them * Even if it was able to do so, clients and service resources are created using a service agnostic factory method and are only identified by a string argument of that method. * boto3_type_annotations defines stand in classes for the clients, service resources, paginators, and waiters provided by boto3's services. Example with “bare” boto3: Example with annotated boto3: Brian #2: How to have your code reviewer appreciate you * By Michael Lynch * Suggested by Miłosz Bednarzak * Actual title “How to Make Your Code Reviewer Fall in Love with You” * but 🤮 * even has the words “your reviewer will literally fall in love with you.” * literally → figuratively, please * Topic is important though, here are some good tips: * Review your own code first * “Don’t just check for mistakes — imagine reading the code for the first time. What might confuse you?” * Write a clear change list description * “A good change list description explains what the change achieves, at a high level, and why you’re making this change.” * Narrowly scope changes * Separate functional and non-functional changes * This is tough, even for me, but important. * Need to fix something, and the formatting is a nightmare and you feel you must blacken it. Do those things in two separate merge requests. * Break up large change lists * A ton to write about. Maybe it deserves 2-3 merges instead of 1. * Respond graciously to critiques * It can feel like a personal attack, but hopefully it’s not. * Responding defensively will only make things works. Greg #3: REPODASH - Quality Metrics for Github repositories * by Laurence Molloy * Do you maintain a project codebase on Github? * Would you like to be able to show the maturity of your project at a glance? * Walk through the metrics available * Use-case Michael #4: Extra, extra, extra, extra, hear all about it * Python 3 Float Security Bug * Building Python 3 from source now :-/ It’s still Python 3.8.5 on Ubuntu with the kernel patch just today! (Linux 5.4.0-66 / Ubuntu 20.04.2) * Finally, I’m Dockering on my M1 mac via: * docker context create remotedocker --docker "host=ssh://user@server" * docker context use remotedocker * docker run -it ubuntu:latest bash now works as usual but remotely! * Why I keep complaining about merge thing on dependabot. Why!?! ;) * Anthony Shaw wrote a bot to help alleviate this a bit. More on that later. Brian #5: testcontainers-python * Suggested by Josh Peak * Why mock a database? Spin up a live one in a docker container. * “Python port for testcontainers-java that allows using docker containers for functional and integration testing. Testcontainers-python provides capabilities to spin up docker containers (such as a database, Selenium web browser, or any other container) for testing.” import sqlalchemy from testcontainers.mysql import MySqlContainer with MySqlContainer('mysql:5.7.32') as mysql: engine = sqlalchemy.create_engine(mysql.get_connection_url()) version, = engine.execute("select version()").fetchone() print(version) # 5.7.32 * The snippet above will spin up a MySql database in a container. The get_connection_url() convenience method returns a sqlalchemy compatible url we use to connect to the database and retrieve the database version. Greg #6: The Python Ecosystem is relentlessly improving price-performance every day * Python is reaching top-of-mind for more and more business decision-makers because their technology teams are delivering solutions to the business with unprecedented price-performance. * The business impact keeps getting better and better. * What seems like heavy adoption throughout the economy is still a relatively small-inroad compared to what we’ll see in the future. It’s like water rapidly collecting behind a weak dam. * It’s an exciting time to be in the Python world! Extras: Brian: * Firefox 86 enhances cookie protection * sites can save cookies. but can’t share between sites. * Firefox maintains separate cookie storage for each site. * Momentary exceptions allowed for some non-tracking cross-site cookie uses, such as popular third party login providers. Joke: 56 Funny Code Comments That People Actually Wrote: These are actually in a code base somewhere (a sampling): /* * Dear Maintainer * * Once you are done trying to ‘optimize’ this routine, * and you have realized what a terrible mistake that was, * please increment the following counter as a warning * to the next guy. * * total_hours_wasted_here = 73 */ // sometimes I believe compiler ignores all my comments // drunk, fix later // Magic. Do not touch. /*** Always returns true ***/ public boolean isAvailable() { return false; }
38 min
Towards Data Science
Towards Data Science
The TDS team
72. Margot Gerritsen - Does AI have to be understandable to be ethical?
As AI systems have become more ubiquitous, people have begun to pay more attention to their ethical implications. Those implications are potentially enormous: Google’s search algorithm and Twitter’s recommendation system each have the ability to meaningfully sway public opinion on just about any issue. As a result, Google and Twitter’s choices have an outsized impact — not only on their immediate user base, but on society in general. That kind of power comes with risk of intentional misuse (for example, Twitter might choose to boost tweets that express views aligned with their preferred policies). But while intentional misuse is an important issue, equally challenging is the problem of avoiding unintentionally bad outputs from AI systems. Unintentionally bad AIs can lead to various biases that make algorithms perform better for some people than for others, or more generally to systems that are optimizing for things we actually don’t want in the long run. For example, platforms like Twitter and YouTube have played an important role in the increasing polarization of their US (and worldwide) user bases. They never intended to do this, of course, but their effect on social cohesion is arguably the result of internal cultures based on narrow metric optimization: when you optimize for short-term engagement, you often sacrifice long-term user well-being. The unintended consequences of AI systems are hard to predict, almost by definition. But their potential impact makes them very much worth thinking and talking about — which is why I sat down with Stanford professor, co-director of the Women in Data Science (WiDS) initiative, and host of the WiDS podcast Margot Gerritsen for this episode of the podcast.
1 hr 22 min
Google Cloud Platform Podcast
Google Cloud Platform Podcast
Google Cloud Platform
Cloud Spanner Revisited with Dilraj Kaur and Christoph Bussler
Mark Mirchandani and Stephanie Wong are back this week as we learn about all the new things happening with Google Cloud Spanner. Our guests this week, Dilraj Kaur and Christoph Bussler, describe Cloud Spanner as a fully managed relational database that boasts unlimited scaling and advanced consistency and availability. Unlimited scaling truly means unlimited, and Chris explains why Cloud Spanner offers this feature and how it’s making database design and development easier. Dilraj and Chris tell us all about the cool new features Spanner has developed, like generated columns and foreign keys, and how customer needs influenced these developments. Chris walks us through the process of using some of these new features, including how developers can monitor their database systems. Managed backups and multi-region configuration are additional recent additions to Cloud Spanner, and our guests explain how these are used by current enterprise clients. Dilraj and Chris explain the automatically managed features of Spanner versus the customer managed features and how people set up and manage database projects. We hear examples of companies using Cloud Spanner and how it has improved their businesses. Dilraj Kaur Dilraj Kaur is an Enterprise Customer Engineer with specialization in Data Management. She has been with Google for about 2.5 years and is based in Atlanta. Christoph Bussler As a Solutions Architect Chris is focusing on databases, data migration and data integration in enterprise customer settings. See his professional work and background on his website. Cool things of the week * New to Google Cloud? Here are a few free trainings to help you get started blog * Start your skills challenge today site * Service Directory is generally available: Simplify your service inventory blog Interview * Google Cloud Spanner site * GCP Podcast Episode 62: Cloud Spanner with Deepti Srivastava podcast * Using the Cloud Spanner Emulator docs * Cloud Spanner Ecosystem site * Cloud Spanner Qwiklabs site * Google Cloud Platform Community On Slack site * Creating and managing generated columns docs * WITH Clause docs * Foreign Keys docs * Numeric Data Type docs * Information schema docs * Overview of introspection tools docs * Backup and Restore docs * Multi-region configurations docs * ShareChat: Building a scalable data-driven social network for non-English speakers globally site * Blockchain.com: Streamlining infrastructure for the world’s most dynamic financial market site * What is Cloud Spanner? video What’s something cool you’re working on? Mark has been working on budgeting blog posts, including Protect your Google Cloud spending with budgets. Stephanie is working on her data center animation series
41 min
More episodes
Search
Clear search
Close search
Google apps
Main menu