Generating accessible pinouts

Motivation

Lately I’ve been working on projects to support blind people who are interested in “maker” hobbies like electronics. There are blind ham radio operators and electronics hobbyists, and at least one professional blind electrical engineer. While electronics could be a great pursuit for more blind people, there are many barriers. As a sighted person, I have access to a huge number of interesting resources to teach myself how to build projects, but almost all of them rely on vision to convey vital information. Because vision is a such a dominant and useful sense, it’s often the only way certain topics in electrical engineering - from waveforms to board layouts to circuit designs - are being represented.

The idea

The goal of this project is to make as many component pinouts as possible accessible to blind people. Most commonly these pinouts appear as either A) an image, or B) a line drawing within a PDF datasheet. Often there is a table of pin assignments in a datasheet as well, but that’s not very helpful if you don’t know how the pin numbers correspond to the physical part.

While blind users may have an embosser capable of producing tactile graphics, many do not. Therefore the “lowest common denominator” that blind internet users all rely on is a screen reader that converts text to spoken words. By providing a textual description of the part and a readable table of pin names, I’m hoping to make design and prototyping easier.

Rather than trying to interpret pinout images, I looked for a large database of pin assignments freely available in machine readable format. One such database is the KiCAD symbol library that powers the increasingly popular free and open source KiCAD PCB design software. This has the structured information I need about thousands of parts, namely 1) the part’s model number, 2) the part’s package type, and 3) the part’s pin numbers and pin names.

Code link

My code for doing this lives in the accessible-pinouts GitHub repo. The main logic is in generate.py.

Extracting symbol information

KiCAD symbol libraries consist of a number of .kicad_sym files, each of which is in a text-based S-expression format. They can contain multiple different symbol definitions, here are just a few lines to give you an example of the format:

(kicad_symbol_lib
	(version 20231120)
	(generator "kicad_symbol_editor")
	(generator_version "8.0")
	(symbol "74469"
		(pin_names
			(offset 1.016)
		)
		(exclude_from_sim no)
		(in_bom yes)
		(on_board yes)
		(property "Reference" "U"
			(at -7.62 19.05 0)
			(effects
				(font
					(size 1.27 1.27)
				)
			)
		)

As you can tell, this is pretty verbose and not easy to read, but it is fairly simple to parse. There’s already a great Python library called kiutils that will turn these code stanzas into trees of Python objects. For our purposes, we really only care about the symbol type. Remember, we’re trying to figure out the following information:

Part number or name
Package type
Pin number to name mapping for all pins

Below I’ll describe how we can extract these details using kiutils.

Basic part metadata

The part model number (or name( is very simple: symbol.entryName.

Each symbol has a set of Property entries under the properties attribute. This is essentially a dictionary but it’s not parsed into one, so when we want a particular key we need to loop through the properties to find it. I have a helper function called prop_value to do that. Common useful properties include Description which gives a short description of the part, and Datasheet with a link to a datasheet PDF.

Package type

Package type is a little more complicated and I didn’t find great documentation about it. For this one, we want to get the name of the part’s footprint. In PCB design, a “footprint” defines the pattern of copper pads and skilscreen lines that should be fabricated on the board to accomodate a component. Because parts mostly come in one of a few dozen standard package types, there are common footprints for parts with the same package and number of pins (for example, a DIP-20 footprint). If we know the footprint name, we can easily tell the package type most of the time.

Many parts have a single footprint in the properties with the name Footprint. I believe this is the older way to define part footprints and it’s much easier to deal with for our purposes because each symbol ID will have only one footprint. Here are some example values from real part footprint names, along with their counts in the dataset:

'Package_SO:SOIC-8_3.9x4.9mm_P1.27mm': 792,
'Package_TO_SOT_SMD:SOT-23-5': 611,
'Package_QFP:LQFP-64_10x10mm_P0.5mm': 414,
'Package_QFP:LQFP-100_14x14mm_P0.5mm': 360,
'Package_TO_SOT_SMD:SOT-23': 322,
'Package_QFP:LQFP-48_7x7mm_P0.5mm': 320,
'Package_DIP:DIP-8_W7.62mm': 288,
'Package_TO_SOT_SMD:SOT-23-6': 230,
'Package_DFN_QFN:QFN-48-1EP_7x7mm_P0.5mm_EP5.6x5.6mm': 215,
'Package_QFP:LQFP-144_20x20mm_P0.5mm': 208,
'Package_SO:MSOP-8_3x3mm_P0.65mm': 172,
'Diode_SMD:D_SMA': 172,
'Package_DFN_QFN:QFN-32-1EP_5x5mm_P0.5mm_EP3.45x3.45mm': 168,
'Package_SO:SOIC-16W_7.5x10.3mm_P1.27mm': 162,
'Package_SO:TSSOP-20_4.4x6.5mm_P0.65mm': 157,
'Package_TO_SOT_THT:TO-92_Inline': 151,
'Package_TO_SOT_THT:TO-220-3_Vertical': 151,
'Package_QFP:LQFP-32_7x7mm_P0.8mm': 145,
'Package_TO_SOT_SMD:SOT-89-3': 134,
'Filter:Filter_Mini-Circuits_FV1206': 134,
'Package_TO_SOT_SMD:SOT-363_SC-70-6': 129,

However many parts don’t have a Footprint value in properties. For these parts there is usually a key called ki_fp_filters that contains one or more string values separated by spaces. Here’s an example:

(property "ki_fp_filters" "SSOP*3.9x4.9mm*P0.635mm* TSSOP*4.4x5mm*P0.65mm* TVSOP*4.4x3.6mm*P0.4mm* SOIC*3.9x9.9mm*P1.27mm*"
...

These are basically some wildcard variation of the explicit Package footprint names, and a symbol can have more than one footprint filter. I’ve decided to handle these in a very simple way, I treat each symbol+footprint combination as its own part in my script. So if a part has a Footprint key I call my process_part_instance() function once with the symbol and that footprint value. If it instead has 5 ki_fp_filters values, I call it 5 times with the same symbol and each of the footprint filter values in turn.

The ki_fp_filters footprint strings don’t match the Footprint property strings exactly, but for our purposes that doesn’t matter. As we’ll see later, all that matters is that I can make a list of what Footprint values and ki_fp_filters values correspond to what packages, and there are a finite number of those values to map.

Pin assignments

Pins are found in a symbol’s units list. Each unit may have a set of pins, and more than one unit can have pins, but the pin numbers never overlap. Some units have no pins at all. I didn’t delve into why this unit structure exists, but for those with multiple units with pins there’s a simple explanation: some chips contain multiple functional subunits. For example, the TSM102 has 4 op-amps in a single package, and its KiCAD symbol has 5 units (power pins are in their own unit). I’m simply ignoring these units for now and putting all the pins into a single list, but I needed to understand the reason behind the multi-unit parts to know that this was valid.

Inheritance

After getting all of the above to work, I still found that many common parts had no pinout. The reason? Inheritance. KiCAD has the concept of part inheritance, where a part can have a parent part. Pins and symbol information must come from the parent, the child part only overrides the properties. The parent part is identified by the part number in the extends property of the child, and must be in the same symbol lib as the child. So in process_symbol I look for an extends property, if there is one I load that symbol as well and use the parent symbol for the pinout, and the child symbol for the model number, description, and footprint.

Rejecting parts

Some people say that you can get 80% of the value of most projects with 20% of the work. To that end, I’m simply automatically rejecting any parts that don’t fit my expectations, rather than trying to deal with them in some way. The script won’t process parts for any of these reasons:

Unexpected number of pins for SOT or TO-92 package
DIP package part doesn’t have an even number of pins
Footprint not in my list of recognized footprints (more on this later)
Too many pins have an empty label

Presenting the pinouts

So now we’ve got symbol names, a description, the name of a footprint, and pin names. How can we make this useful?

Package description

First is the problem of describing the package. As I mentioned before, there are only a few dozen different types of electronic component packages. They have names like DIP, BGA, SOT, TO-220, QFP, PLCC, etc… Often a single package type can come in variations with different numbers of pins - for example the DIP-20 has 20 pins, while a DIP-8 has 8 pins. But the pin numbering scheme for those packages follows the same pattern each time. So I decided to focus on a few package types, write a generic description of how the pins are laid out for those packages, and then choose the best description for each part.

For now, I’m only generating descriptions for DIP, TO-220, and TO-92 parts. These are the most common through-hole component packages we’re likely to encounter when building prototypes and breadboarding. I then determined what footprints correspond to what package by simply grepping through the footprint names. I’m also leaving out things like capacitors, LEDs, and diodes which don’t have part specific pinouts, although it would be a good idea to write a page on how to identify their leads.

When processing a part, the script looks for the footprint name in a dictionary called PACKAGE_REGISTRY in packages.py. This maps the footprint name (or ki_fp_filters name) to a Package object, where Package is a class I created to hold part package descriptions. Each Package object has a textual description like this one for the TO-92 package:

This small package has three pins protruding from one end. The body of the part has a curved side and a flat side, forming a D-shaped cross section. With the flat side facing up and the three pins pointing toward you, the pins are numbered 1, 2, 3 from left to right.

It also has an optional concept called group_names which helps provide some simple groupings of pins for packages with a larger pin count. To understand what this does, let’s consider some examples. A DIP-8 package has 8 pins - since it’s a “Dual Inline Package” these are organized into two parallel rows or 4 pins. So it would be nice to group them that way on the page, and let the reader know “these are the left side pins” and “these are the right side pins”. Likewise, a QFP package has pins on all 4 sides, (usually) split evenly between all the sides. So it would be nice to be able to separate the to pins, bottom pins, left pins, and right pins. That way you only need to start counting from one end rather than from pin 1. If the Package class has a group_names list, when we’re formatting the pins table we’ll break the pins evenly into groups and assign them in order to those names.

As you might guess, many different footprint names can map to the same package type. I decided not to bother writing different descriptions for 8 pin, 10 pin, 12 pin, and so on versions of the DIP package. Instead, I searched through the footprint names for all the DIP footprints and assigned them to one single DIP package description. This works because they’ll always have an even number of pins and I can always split them into left and right groups, even if the cutoff differs between DIP-10 and DIP-20. (Aside: any parts that violate this even number rule are simply thrown out for now; I may go back and figure out what’s going on there later).

Cleaning up pin names

KiCAD has some formatting conventions for the pin names that don’t read well with a screen reader. For example, a tilde with braces will cause KiCAD to draw a bar over the text in the braces. This is a convention to show that a pin is inverted or “active low”. For example, ~{RESET} means the pin is in the reset state when it’s low, not when it’s high. Similarly _{foo} indicates that foo is a subscript, and ^{foo} indicates that foo is a superscript. My reformat_label() function makes a basic attempt to translate these to something that reads well using regular expressions. For example, ~{RESET} will be changed to RESET (active low) which is more meaningful.

Generating HTML

First, I decided to present the output as HTML. HTML is obviously easy to share on the web, and it’s also easy for screen readers to understand. As long as I use reasonable semantic tags and make sure the content can be read in a liner fashion, someone with a screen reader (or Braille display) can access it.

For this I simply used the popular [Jinja2 templating library]. I wrote [a simple template] that takes in all the part metadata I looked up and formats it into an HTML file. My code runs this once per part-footprint combination and spits them out into a directory of my choice. Since blindmakers.net is a static website generated using [Hugo], it’s easy to incorporate these HTML files into the site. If I chose to serve the site differently in the future, it will still be easy to work with these HTML files.

The only downside right now is search. With a static site generator, there’s no search built in. I have a long pinouts table page that you can search using Ctrl+F, and Google does a good job indexing these pages. So if you search Google for “site:blindmakers.net 4027 pinout” you’ll find the right page on blindmakers.net.

Conclusion + future work

I got something working in just a few days, and it contains most of the basic parts I’m interested in. Now I’m waiting for feedback to see if this idea is useful and worth investing more.

One thing to do is to get statistics on why parts are being rejected and see if the information is indeed there to produce pinouts for them. There are probably things I’m missing about symbol definitions, like I did with inheritance at first.

Other work includes:

Describing more package types
Better filtering of parts with missing labels. Some parts (e.g. the LM324) don’t have pin labels for several of the pins. They appear in KiCAD s library as ~.
Alternative names. The same chip can have multiple part numbers with the same pinout and package. This is particularly true of long-lived parts like the 74xx logic series where the 74HC74 and the 74LS74 and the 74LS74A and 74LS74AN are all part number variations of essentially the same part. If you know this it’s easy to try different ones until you find the symbol that happens to exist in the KiCAD library, but if you don’t you might not realize the pinout is actually there.
Search
Datasheet links + more part information
Additional parts libraries

by Troy

May 20, 2024