Planet Debian

Subscribe to Planet Debian feed
Planet Debian -
Updated: 8 hours 19 min ago

Julien Danjou: Profiling Python using cProfile: a concrete case

Mon, 16/11/2015 - 16:00

Writing programs is fun, but making them fast can be a pain. Python programs are no exception to that, but the basic profiling toolchain is actually not that complicated to use. Here, I would like to show you how you can quickly profile and analyze your Python code to find what part of the code you should optimize.

What's profiling?

Profiling a Python program is doing a dynamic analysis that measures the execution time of the program and everything that compose it. That means measuring the time spent in each of its functions. This will give you data about where your program is spending time, and what area might be worth optimizing.

It's a very interesting exercise. Many people focus on local optimizations, such as determining e.g. which of the Python functions range or xrange is going to be faster. It turns out that knowing which one is faster may never be an issue in your program, and that the time gained by one of the functions above might not be worth the time you spend researching that, or arguing about it with your colleague.

Trying to blindly optimize a program without measuring where it is actually spending its time is a useless exercise. Following your guts alone is not always sufficient.

There are many types of profiling, as there are many things you can measure. In this exercise, we'll focus on CPU utilization profiling, meaning the time spent by each function executing instructions. Obviously, we could do many more kind of profiling and optimizations, such as memory profiling which would measure the memory used by each piece of code – something I talk about in The Hacker's Guide to Python.


Since Python 2.5, Python provides a C module called cProfile which has a reasonable overhead and offers a good enough feature set. The basic usage goes down to:

>>> import cProfile
>>>'2 + 2')
2 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Though you can also run a script with it, which turns out to be handy:

$ python -m cProfile -s cumtime
72270 function calls (70640 primitive calls) in 4.481 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 4.481 4.481<module>)
1 0.001 0.001 4.296 4.296
3 0.000 0.000 4.286 1.429
3 0.000 0.000 4.268 1.423
4/3 0.000 0.000 3.816 1.272
4 0.000 0.000 2.965 0.741
4 0.000 0.000 2.962 0.740
4 0.000 0.000 2.961 0.740
2 0.000 0.000 2.675 1.338
30 0.000 0.000 1.621 0.054
30 0.000 0.000 1.621 0.054
30 1.621 0.054 1.621 0.054 {method 'read' of '_ssl._SSLSocket' objects}
1 0.000 0.000 1.611 1.611
4 0.000 0.000 1.572 0.393
4 0.000 0.000 1.572 0.393
60 0.000 0.000 1.571 0.026
4 0.000 0.000 1.571 0.393
1 0.000 0.000 1.462 1.462
1 0.000 0.000 1.462 1.462
1 0.000 0.000 1.462 1.462
1 0.000 0.000 1.459 1.459

This prints out all the function called, with the time spend in each and the number of times they have been called.

Advanced visualization with KCacheGrind

While being useful, the output format is very basic and does not make easy to grab knowledge for complete programs. For more advanced visualization, I leverage KCacheGrind. If you did any C programming and profiling these last years, you may have used it as it is primarily designed as front-end for Valgrind generated call-graphs.

In order to use, you need to generate a cProfile result file, then convert it to KCacheGrind format. To do that, I use pyprof2calltree.

$ python -m cProfile -o myscript.cprof
$ pyprof2calltree -k -i myscript.cprof

And the KCacheGrind window magically appears!

Concrete case: Carbonara optimization

I was curious about the performances of Carbonara, the small timeserie library I wrote for Gnocchi. I decided to do some basic profiling to see if there was any obvious optimization to do.

In order to profile a program, you need to run it. But running the whole program in profiling mode can generate a lot of data that you don't care about, and adds noise to what you're trying to understand. Since Gnocchi has thousands of unit tests and a few for Carbonara itself, I decided to profile the code used by these unit tests, as it's a good reflection of basic features of the library.

Note that this is a good strategy for a curious and naive first-pass profiling. There's no way that you can make sure that the hotspots you will see in the unit tests are the actual hotspots you will encounter in production. Therefore, a profiling in conditions and with a scenario that mimics what's seen in production is often a necessity if you need to push your program optimization further and want to achieve perceivable and valuable gain.

I activated cProfile using the method described above, creating a cProfile.Profile object around my tests (I actually started to implement that in testtools). I then run KCacheGrind as described above. Using KCacheGrind, I generated the following figures.

The test I profiled here is called test_fetch and is pretty easy to understand: it puts data in a timeserie object, and then fetch the aggregated result. The above list shows that 88 % of the ticks are spent in set_values (44 ticks over 50). This function is used to insert values into the timeserie, not to fetch the values. That means that it's really slow to insert data, and pretty fast to actually retrieve them.

Reading the rest of the list indicates that several functions share the rest of the ticks, update, _first_block_timestamp, _truncate, _resample, etc. Some of the functions in the list are not part of Carbonara, so there's no point in looking to optimize them. The only thing that can be optimized is, sometimes, the number of times they're called.

The call graph gives me a bit more insight about what's going on here. Using my knowledge about how Carbonara works, I don't think that the whole stack on the left for _first_block_timestamp makes much sense. This function is supposed to find the first timestamp for an aggregate, e.g. with a timestamp of 13:34:45 and a period of 5 minutes, the function should return 13:30:00. The way it works currently is by calling the resample function from Pandas on a timeserie with only one element, but that seems to be very slow. Indeed, currently this function represents 25 % of the time spent by set_values (11 ticks on 44).

Fortunately, I recently added a small function called _round_timestamp that does exactly what _first_block_timestamp needs that without calling any Pandas function, so no resample. So I ended up rewriting that function this way:

def _first_block_timestamp(self):
- ts = self.ts[-1:].resample(self.block_size)
- return (ts.index[-1] - (self.block_size * self.back_window))
+ rounded = self._round_timestamp(self.ts.index[-1], self.block_size)
+ return rounded - (self.block_size * self.back_window)

And then I re-run the exact same test to compare the output of cProfile.

The list of function seems quite different this time. The number of time spend used by set_values dropped from 88 % to 71 %.

The call stack for set_values shows that pretty well: we can't even see the _first_block_timestamp function as it is so fast that it totally disappeared from the display. It's now being considered insignificant by the profiler.

So we just speed up the whole insertion process of values into Carbonara by a nice 25 % in a few minutes. Not that bad for a first naive pass, right?

Categories: Elsewhere

Wouter Verhelst: terrorism

Mon, 16/11/2015 - 09:07

noun | ter·ror·ism | \ˈter-ər-ˌi-zəm\ | no plural

The mistaken belief that it is possible to change the world through acts of cowardice.

They killed a lot of people, but their terrorism only intensified the people's resolve.

Categories: Elsewhere

Norbert Preining: Movies: Monuments Men and Interstellar

Mon, 16/11/2015 - 01:01

Over the rainy weekend we watched two movies: Monuments Men (in Japanese it is called Michelangelo Project!) and Interstellar. Both blockbuster movies from the usual American companies, they are light-years away when it comes to quality. The Monuments Men are boring, without a story, without depth, historically inaccurate, a complete failure. Interstellar, although a long movie, keeps you frozen in the seat while being as scientific as possible and starts your brain working heavily.

My personal verdict: 3 rotten eggs (because Rotten Tomatoes are not stinky enough) for the Monuments Men, and 4 stars for Interstellar.


First for the plot of the two movies: The Monuments Men is loosely based on a true story about rescuing pieces of art at the end of the second world war, before the Nazis destroy them or the Russian take them away. A group of art experts is sent into Europe and manages to find several hiding places of art taken by the Nazis.

Interstellar is set in near future where the conditions on the earth are deteriorating to a degree that human life seems to be soon impossible. Some years before the movie plays a group of astronauts were sent through a wormhole into a different galaxy to search for new inhabitable planets. Now it is time to check out these planets, and try to establish colonies there. Cooper, a retired NASA officer and pilot, now working as farmer, and his daughter are guided by some mysterious way to a secret NASA facility. Cooper is drafted for being a pilot on the reconnaissance mission, and leaves earth and our galaxy through the same wormhole. (Not telling more!)

Monuments Men

Looking at the cast of Monuments Men (George Clooney, Matt Damon, Bill Murray, John Goodman, Jean Dujardin, Bob Balaban, Hugh Bonneville, and Cate Blanchett) one would expect a great movie – but from the very first to the very last scene, it is a slowly meandering shallow flow of sticked together scenes without much coherence. Tension is generated only through unrelated events (stepping onto a landmine, patting a horse), but never developed properly. Dialogs are shallow and boring – with one exception: When Frank Stokes (George Clooney) meets the one German and inquires general about the art, predicting his future being hanged.

Historically, the movie is as inaccurate as it can be – despite Clooney stating that “80 percent of the story is still completely true and accurate, and almost all of the scenes happened”. That contrasts starkly with the verdict of Nigel Pollard (Swansea University): “There’s a kernel of history there, but The Monuments Men plays fast and loose with it in ways that are probably necessary to make the story work as a film, but the viewer ends up with a fairly confused notion of what the organisation was, and what it achieved.”

The movie leaves a bitter aftertaste, hailing of American heroism paired with the usual stereotypes (French amour, German retarded, Russian ignorance, etc). Together with the half baked dialogues it feels like a permanent coitus interruptus.


Interstellar cannot serve with a similar cast, but still a few known people (Matthew McConaughey, Anne Hathaway, and Michael Caine!). But I believe this is actually a sign of quality. Well balancing scientific accuracy and the requirements for blockbusters, the movie successfully spans the bridge between complicated science, in particular general gravity, and entertainment. While not going so far to call the move edutainment (like both the old and new Cosmos), it is surprising how much of hard science is packed into this movie. This is mostly thanks to the theoretical physicist Kip Thorne acting as scientific consultant for the movie, but also due to the director Christopher Nolan being serious about it and studying relativity at Caltech.

Of course, scientific accuracy has limits – nobody knows what happens if one crosses the event horizon of a black hole, and even the existence of wormholes is purely theoretical by now. Still, throughout the movie it follows the two requirements laid out by Kip Thorne: “First, that nothing would violate established physical laws. Second, that all the wild speculations… would spring from science and not from the fertile mind of a screenwriter.”

I think the biggest compliment was that, despite the length, despite a long day out (see next blog), despite the rather unfamiliar topic, my wife, who is normally not interested in space movies and that kind, didn’t fall asleep throughout the movie, and I had to stop several times to explain details of the theory of gravity and astronomy. So in some sense it was perfect edutainment!

Categories: Elsewhere

Manuel A. Fernandez Montecelo: Work on aptitude

Mon, 16/11/2015 - 00:44

Midsummer for me is also known as “Noite do Lume Novo” (literally “New Fire Night”), one of the big calendar events of the year, marking the end of the school year and the beginning of summer.

On this day, there are celebrations not very unlike the bonfires in the Guy Fawkes Night in England or Britain [1]. It is a bit different in that it is not a single event for the masses, more of a friends and neighbours thing, and that it lasts for a big chunk of the night (sometimes until morning). Perhaps for some people, or outside bigger towns or cities, Guy Fawkes Night is also celebrated in that way ─ and that's why during the first days of November there are fireworks rocketing and cracking in the neighbourhoods all around.

Like many other celebrations around the world involving bonfires, many of them also happening around the summer solstice, it is supposed to be a time of renewal of cycles, purification and keeping the evil spirits away; with rituals to that effect like jumping over the fire ─ when the flames are not high and it is safe enough.

So it was fitting that, in the middle of June (almost Midsummer in the northern hemisphere), I learnt that I was about to leave my now-previous job, which is a pretty big signal and precursor for renewal (and it might have something to do with purifying and keeping the evil away as well ;-) ).

Whatever... But what does all of this have to do with aptitude or Debian, anyway?

For one, it was a question of timing.

While looking for a new job (and I am still at it), I had more spare time than usual. DebConf 15 @ Heidelberg was within sight, and for the first time circumstances allowed me to attend this event.

It also coincided with the time when I re-gained access to commit to aptitude on the 19th of June. Which means Renewal.

End of June was also the time of the announcement of the colossal GCC-5/C++11 ABI transition in Debian, that was scheduled to start on the 1st of August, just before the DebConf. Between 2 and 3 thousand source packages in Debian were affected by this transition, which a few months later is not yet finished (although the most important parts were completed by mid-end September).

aptitude itself is written in C++, and depends on several libraries written in C++, like Boost, Xapian and SigC++. All of them had to be compiled with the new C++11 ABI of GCC-5, in unison and in a particular order, for aptitude to continue to work (and for minimal breakage). aptitude and some dependencies did not even compile straight away, so this transition meant that aptitude needed attention just to keep working.

Having recently being awarded again with the Aptitude Hat, attending DebConf for the first time and sailing towards the Transition Maelstrom, it was a clear sign that Something Had to Be Done (to avoid the sideways looks and consequent shame at DebConf, if nothing else).

Happily (or a bit unhappily for me, but let's pretend...), with the unexpected free time in my hands, I changed the plans that I had before re-gaining the Aptitude Hat (some of them involving Debian, but in other ways ─ maybe I will post about that soon).

In July I worked to fix the problems before the transition started, so aptitude would be (mostly) ready, or in the worst case broken only for a few days, while the chain of dependencies was rebuilt. But apart from the changes needed for the new GCC-5, it was decided at the last minute that Boost 1.55 would not be rebuilt with the new ABI, and that the only version with the new ABI would be 1.58 (which caused further breakage in aptitude, was added to experimental only a few days before, and was moved to unstable after the transition had started). Later, in the first days of the transition, aptitude was affected for a few days by breakage in the dependencies, due to not being compiled in sequence according to the transition levels (so with a mix of old and new ABI).

With the critical intervention of Axel Beckert (abe / XTaran), things were not so bad as they could have been. He was busy testing and uploading in the critical days when I was enjoying a small holiday on my way to DebConf, with minimal internet access and communicating almost exclusively with him; and he promptly tended the complaints arriving in the Bug Tracking System and asked for rebuilds of the dependencies with the new ABI. He also brought the packaging up to shape, which had decayed a bit in the last few years.

Gruesome Challenges

But not all was solved yet, more storms were brewing and started to appear in the horizon, in the form of clouds of fire coming from nearby realms.

The APT Deities, which had long ago spilled out their secret, inner challenge (just the initial paragraphs), were relentless. Moreover, they were present at Heidelberg in full force, in ─or close to─ their home grounds, and they were Marching Decidedly towards Victory:

In the talk @ DebConf “This APT has Super Cow Powers” (video available), by David Kalnischkies, they told us about the niceties of apt 1.1 (still in experimental but hopefully coming to unstable soon), and they boasted about getting the lead in our arms race (should I say bugs race?) by a few open bug reports.

This act of provocation further escalated the tensions. The fierce competition which had been going on for some time gained new heights. So much so that APT Deities and our team had to sit together in the outdoor areas of the venue and have many a weissbier together, while discussing and fixing bugs.

But beneath the calm on the surface, and while pretending to keep good diplomatic relations, I knew that Something Had to Be Done, again. So I could only do one thing ─ jump over the bonfire and Keep the Evil away, be that Keep Evil bugs Away or Keep Evil APT Deities Away from winning the challenge, or both.

After returning from DebConf I continued to dedicate time to the project, more than a full time job in some weeks, and this is what happened in the last few months, summarised in another graph, showing the evolution of the BTS for aptitude:

The numbers for apt right now (15th November 2015) are:

  • 629 open (731 if counting all merged bugs independently)
  • 0 Release Critical
  • 275 (318 unmerged) with severity Important or Normal
  • 354 (413 unmerged) with severity Minor or Wishlist
  • 0 marked as Forwarded or Pending

The numbers for aptitude right now are:

  • 488 (573 if counting all merged bugs independently)
  • 1 Release Critical (but it is an artificial bug to keep it from migrating to testing)
  • 197 (239 unmerged) with severity Important or Normal
  • 271 (313 unmerged) with severity Minor or Wishlist
  • 19 (20 unmerged) marked as Forwarded or Pending
The Aftermath

As we can see, for the time being I could keep the Evil at bay, both in terms of bugs themselves and re-gaining the lead in the bugs race ─ the Evil APT Deities were thwarted again in their efforts.

... More seriously, as most of you suspected, the graph above is not the whole truth, so I don't want to boast too much. A big part of the reduction in the number of bugs is because of merging duplicates, closing obsolete bugs, applying translations coming from multiple contributors, or simple fixes like typos and useful suggestions needing minor changes. Many of remaining problems are comparatively more difficult or time consuming that the ones addressed so far (except perhaps avoiding the immediate breakage of the transition, that took weeks to solve), and there are many important problems still there, chief among those is aptitude offering very poor solutions to resolve conflicts.

Still, even the simplest of the changes takes effort, and triaging hundreds of bugs is not fun at all and mostly a thankless effort ─ althought there is the occasionally kind soul that thanks you for handling a decade-old bug.

If being subjected to the rigours of the BTS and reading and solving hundreds of bug reports is not Purification, I don't know what it is.

Apart from the triaging, there were 118 bugs closed (or pending) due to changes made in the upstream part or the packaging in the last few months, and there are many changes that are not reflected in bugs closed (like most of the changes needed due to the C++11 ABI transition, bugs and problems fixed that had no report, and general rejuvenation or improvement of some parts of the code).

How long this will last, I cannot know. I hope to find a job at some point, which obviously will reduce the time available to work on this.

But in the meantime, for all aptitude users: Enjoy the fixes and new features!


[1] ^ Some visitors of the recent mini-DebConf @ Cambridge perhaps thought that the fireworks and throngs gathered were in honour of our mighty Universal Operating System, but sadly they were not. They might be, some day. In any case, the reports say that the visitors enjoyed the fireworks.

Categories: Elsewhere

Carl Chenet: Retweet 0.5 : only retweet some tweets

Mon, 16/11/2015 - 00:00

Retweet 0.5 is now available ! The main new feature is: Retweet now lets you define a list of hashtags that, if they appear in the text of the tweet, this tweet is not retweeted.

You only need this line in your retweet.ini configuration file:


Have a look at the official documentation to read how it extensively works.

Retweet 0.5 is available on the PyPI repository and is already in the official Debian unstable repository.

Retweet is in production already for Le Journal Du hacker , a French FOSS community website to share and relay news and , a job board for the French-speaking FOSS community.

What about you? does Retweet allow you to develop your Twitter account? Let your comments in this article.

Categories: Elsewhere

Dirk Eddelbuettel: Rcpp 0.12.2: More refinements

Sun, 15/11/2015 - 21:45

The second update in the 0.12.* series of Rcpp is now on the CRAN network for GNU R. As usual, I will also push a Debian package. This follows the 0.12.0 release from late July which started to add some serious new features, and builds upon the 0.12.1 release in September. It also marks the sixth release this year where we managed to keep a steady bi-montly release frequency.

Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 512 packages on CRAN depend on Rcpp for making analytical code go faster and further. That is up by more than fifty package from the last release in September (and we recently blogged about crossing 500 dependents).

This release once again features pull requests from two new contributors with Nathan Russell and Tianqi Chen joining in. As shown below, other recent contributors (such as such as Dan) are keeping at it too. Keep'em coming! Luke Tierney also email about a code smell he spotted and which we took care of. A big Thank You! to everybody helping with code, bug reports or documentation. See below for a detailed list of changes extracted from the NEWS file.

Changes in Rcpp version 0.12.2 (2015-11-14)
  • Changes in Rcpp API:

    • Correct return type in product of matrix dimensions (PR #374 by Florian)

    • Before creating a single String object from a SEXP, ensure that it is from a vector of length one (PR #376 by Dirk, fixing #375).

    • No longer use STRING_ELT as a left-hand side, thanks to a heads-up by Luke Tierney (PR #378 by Dirk, fixing #377).

    • Rcpp Module objects are now checked more carefully (PR #381 by Tianqi, fixing #380)

    • An overflow in Matrix column indexing was corrected (PR #390 by Qiang, fixing a bug reported by Allessandro on the list)

    • Nullable types can now be assigned R_NilValue in function signatures. (PR #395 by Dan, fixing issue #394)

    • operator<<() now always shows decimal points (PR #396 by Dan)

    • Matrix classes now have a transpose() function (PR #397 by Dirk fixing #383)

    • operator<<() for complex types was added (PRs #398 by Qiang and #399 by Dirk, fixing #187)

  • Changes in Rcpp Attributes:

    • Enable export of C++ interface for functions that return void.

  • Changes in Rcpp Sugar:

    • Added new Sugar function cummin(), cummax(), cumprod() (PR #389 by Nathan Russell fixing #388)

    • Enabled sugar math operations for subsets; e.g. x[y] + x[z]. (PR #393 by Kevin and Qiang, implementing #392)

  • Changes in Rcpp Documentation:

    • The NEWS file now links to GitHub issue tickets and pull requests.

    • The Rcpp.bib file with bibliographic references was updated.

Thanks to CRANberries, you can also look at a diff to the previous release As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: Elsewhere

Lunar: Reproducible builds: week 29 in Stretch cycle

Sun, 15/11/2015 - 18:51

What happened in the reproducible builds effort this week:

Toolchain fixes

Emmanuel Bourg uploaded eigenbase-resgen/ which uses of the scm-safe comment style by default to make them deterministic.

Mattia Rizzolo started a new thread on debian-devel to ask a wider audience for issues about the -Wdate-time compile time flag. When enabled, GCC and clang print warnings when __DATE__, __TIME__, or __TIMESTAMP__ are used. Having the flag set by default would prompt maintainers to remove these source of unreproducibility from the sources.

Packages fixed

The following packages have become reproducible due to changes in their build dependencies: bmake, cyrus-imapd-2.4, drobo-utils, eigenbase-farrago, fhist, fstrcmp, git-dpm, intercal, libexplain, libtemplates-parser, mcl, openimageio, pcal, powstatd, ruby-aggregate, ruby-archive-tar-minitar, ruby-bert, ruby-dbd-odbc, ruby-dbd-pg, ruby-extendmatrix, ruby-rack-mobile-detect, ruby-remcached, ruby-stomp, ruby-test-declarative, ruby-wirble, vtprint.

The following packages became reproducible after getting fixed:

Some uploads fixed some reproducibility issues, but not all of them:

Patches submitted which have not made their way to the archive yet:

  • #804729 on pbuilder by Reiner Herrmann: tell dblatex to build in a deterministic path.

The fifth and sixth armhf build nodes have been set up, resulting in five more builder jobs for armhf. More than 10,000 packages have now been identified as reproducible with the “reproducible” toolchain on armhf. (Vagrant Cascadian, h01ger)

Helmut Grohne and Mattia Rizzolo now have root access on all 12 build nodes used by and (h01ger) is now linked from all package pages and the dashboard. (h01ger)

profitbricks-build5-amd64 and profitbricks-build6-amd64, responsible for running amd64 tests now run 398.26 days in the future. This means that one of the two builds that are being compared will be run on a different minute, hour, day, month, and year. This is not yet the case for armhf. FreeBSD tests are also done with 398.26 days difference. (h01ger)

The design of the Arch Linux test page has been greatly improved. (Levente Polyak)

diffoscope development

Three releases of diffoscope happened this week numbered 39 to 41. It includes support for EPUB files (Reiner Herrmann) and Free Pascal unit files, usually having .ppu as extension (Paul Gevers).

The rest of the changes were mostly targetting at making it easier to run diffoscope on other systems. The tlsh, rpm, and debian modules are now all optional. The test suite will properly skip tests that need optional tools or modules when they are not available. As a result, diffosope is now available on PyPI and thanks to the work of Levente Polyak in Arch Linux.

Getting these versions in Debian was a bit cumbersome. Version 39 was uploaded with an expired key (according to the keyring on which will hopefully be updated soon) which is currently handled by keeping the files in the queue without REJECTing them. This prevented any other Debian Developpers to upload the same version. Version 40 was uploaded as a source-only upload… but failed to build from source which had the undesirable side effect of removing the previous version from unstable. The package faild to build from source because it was built passing -I to debbuild. This excluded the ELF object files and static archives used by the test suite from the archive, preventing the test suite to work correctly. Hopefully, in a nearby future it will be possible to implement a sanity check to prevent such mistakes in the future.

It has also been identified that ppudump outputs time in the system timezone without considering the TZ environment variable. Zachary Vance and Paul Gevers raised the issue on the appropriate channels.

strip-nondeterminism development

Chris Lamb released strip-nondeterminism version 0.014-1 which disables stripping Mono binaries as it is too aggressive and the source of the problem is being worked on by Mono upstream.

Package reviews

133 reviews have been removed, 115 added and 103 updated this week.

Chris West and Chris Lamb reported 57 new FTBFS bugs.


The video of h01ger and Chris Lamb's talk at MiniDebConf Cambridge is now available.

h01ger gave a talk at CCC Hamburg on November 13th, which was well received and sparked some interest among Gentoo folks. Slides and video should be available shortly.

Frederick Kautz has started to revive Dhiru Kholia's work on testing Fedora packages.

Your editor wish to once again thank #debian-reproducible regulars for reviewing these reports weeks after weeks.

Categories: Elsewhere

Simon McVittie: Discworld Noir in a Windows 98 virtual machine on Linux

Sun, 15/11/2015 - 18:19

Discworld Noir was a superb adventure game, but is also notoriously unreliable, even in Windows on real hardware; using Wine is just not going to work. After many attempts at bringing it back into working order, I've settled on an approach that seems to work: now that qemu and libvirt have made virtualization and emulation easy, I can run it in a version of Windows that was current at the time of its release. Unfortunately, Windows 98 doesn't virtualize particularly well either, so this still became a relatively extensive yak-shaving exercise.

These instructions assume that /srv/virt is a suitable place to put disk images, but you can use anywhere you want.

The emulated PC

After some trial and error, it seems to work if I configure qemu to emulate the following:

  • Fully emulated hardware instead of virtualization (qemu-system-i386 -no-kvm)
  • Intel Pentium III
  • Intel i440fx-based motherboard with ACPI
  • Real-time clock in local time
  • No HPET
  • 256 MiB RAM
  • IDE primary master: IDE hard disk (I used 30 GiB, which is massively overkill for this game; qemu can use sparse files so it actually ends up less than 2 GiB on the host system)
  • IDE primary slave, secondary master, secondary slave: three CD-ROM drives
  • PS/2 keyboard and mouse
  • Realtek AC97 sound card
  • Cirrus video card with 16 MiB video RAM

A modern laptop CPU is an order of magnitude faster than what Discworld Noir needs, so full emulation isn't a problem, despite being inefficient.

There is deliberately no networking, because Discworld Noir doesn't need it, and a 17 year old operating system with no privilege separation is very much not safe to use on the modern Internet!

Software needed
  • Windows 98 installation CD-ROM as a .iso file (cp /dev/cdrom windows98.iso) - in theory you could also use a real optical drive, but my laptop doesn't usually have one of those. I used the OEM disc, version 4.10.1998 (that's the original Windows 98, not the Second Edition), which came with a long-dead PC, and didn't bother to apply any patches.
  • A Windows 98 license key. Again, I used an OEM key from a past PC.
  • A complete set of Discworld Noir (English) CD-ROMs as .iso files. I used the UK "Sold Out Software" budget release, on 3 CDs.
  • A multi-platform Realtek AC97 audio driver.
Windows 98 installation

It seems to be easiest to do this bit by running qemu-system-i386 manually:

qemu-img create -f qcow2 /srv/virt/discworldnoir.qcow2 30G qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \ -drive media=cdrom,format=raw,file=/srv/virt/windows98.iso \ -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime

Don't start the installation immediately. Instead, boot the installation CD to a DOS prompt with CD-ROM support. From here, run


and create a single partition filling the emulated hard disk. When finished, hard-reboot the virtual machine (press Ctrl+C on the qemu-system-i386 process and run it again).

The DOS FORMAT.COM utility is on the Windows CD-ROM but not in the root directory or the default %PATH%, so you'll have to run:

d:\win98\format c:

to create the FAT filesystem. You might have to reboot again at this point.

The reason for doing this the hard way is that the Windows 98 installer doesn't detect qemu as supporting ACPI. You want ACPI support, so that Windows will issue IDLE instructions from its idle loop, instead of occupying a CPU core with a busy-loop. To get that, boot to a DOS prompt again, and use:

setup /p j /iv

/p j forces ACPI support (Thanks to "Richard S" on the Virtualbox forums for this tip.) /iv is unimportant, but it disables the annoying "billboards" during installation, which advertised once-exciting features like support for dial-up modems and JPEG wallpaper.

I used a "Typical" installation; there didn't seem to be much point in tweaking the installed package set when everything is so small by modern standards.

Windows 98 has built-in support for the Cirrus VGA card that we're emulating, so after a few reboots, it should be able to run in a semi-modern resolution and colour depth. Discworld Noir apparently prefers a 640 × 480 × 16-bit video mode, so right-click on the desktop background, choose Properties and set that up.

Audio drivers

This is the part that took me the longest to get working. Of the sound cards that qemu can emulate, Windows 98 only supports the SoundBlaster 16 out of the box. Unfortunately, the Soundblaster 16 emulation in qemu is incomplete, and in particular version 2.1 (as shipped in Debian 8) has a tendency to make Windows lock up during boot.

I've seen advice in various places to emulate an Eqsonic ES1370 (SoundBlaster AWE 64), but that didn't work for me: one of the drivers I tried caused Windows to lock up at a black screen during boot, and the other didn't detect the emulated hardware.

The next-oldest sound card that qemu can emulate is a Realtek AC97, which was often found integrated into motherboards in the late 1990s. This one seems to work, with the "A400" driver bundle linked above. For Windows 98 first edition, you need a driver bundle that includes the old "VXD" drivers, not just the "WDM" drivers supported by Second Edition and newer.

The easiest way to get that into qemu seems to be to turn it into a CD image:

genisoimage -o /srv/virt/discworldnoir-drivers.iso WDM_A400.exe qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \ -drive media=cdrom,format=raw,file=/srv/virt/windows98.iso \ -drive media=cdrom,format=raw,file=/srv/virt/discworldnoir-drivers.iso \ -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime -soundhw ac97

Run the installer from E:, then reboot with the Windows 98 CD inserted, and Windows should install the driver.

Installing Discworld Noir

Boot up the virtual machine with CD 1 in the emulated drive:

qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \ -drive media=cdrom,format=raw,file=/srv/virt/DWN_ENG_1.iso \ -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime -soundhw ac97

You might be thinking "... why not insert all three CDs into D:, E: and F:?" but the installer expects subsequent disks to appear in the same drive where CD 1 was initially, so that won't work. Instead, when prompted for a new CD, switch to the qemu monitor with Ctrl+Alt+2 (note that this is 2, not F2). At the (qemu) prompt, use info block to see a list of emulated drives, then issue a command like

change ide0-cd1 /srv/virt/DWN_ENG_2.iso

to swap the CD. Then switch back to Windows' console with Ctrl+Alt+1 and continue installation. I used a Full installation of Discworld Noir.

Transferring the virtual machine to GNOME Boxes

Having finished the "control freak" phase of installation, I wanted a slightly more user-friendly way to run this game, so I transferred the virtual machine to be used by libvirtd, which is the backend for both GNOME Boxes and virt-manager:

virsh create discworldnoir.xml

Here is the configuration I used. It's a mixture of automatic configuration from virt-manager, and hand-edited configuration to make it match the qemu-system-i386 command-line.

Running the game

If all goes well, you should now see a discworldnoir virtual machine in GNOME Boxes, in which you can boot Windows 98 and play the game. Have fun!

Categories: Elsewhere

Daniel Pocock: Migrating data from Windows phones

Sat, 14/11/2015 - 22:18

Many of the people who have bought Windows phones seek relief sooner or later. Sometimes this comes about due to peer pressure or the feeling of isolation, in other cases it is the frustration of the user interface or the realization that they can't run cool apps like Lumicall.

Frequently, the user has been given the phone as a complimentary upgrade when extending a contract without perceiving the time, effort and potential cost involved in getting their data out of the phone, especially if they never owned a smartphone before.

When a Windows phone user does decide to cut their losses, they are usually looking to a friend or colleague with technical expertise to help them out. Personally, I'm not sure that anybody I would regard as an IT expert has ever had a Windows phone though, meaning that many experts are probably also going to be scratching their heads when somebody asks them for help. Therefore, I've put together this brief guide to help deal with these phones more expediently when they are encountered.

The Windows phones have really bad support for things like CalDAV and WebDAV so don't get your hopes up about using such methods to backup the data to any arbitrary server. Searching online you can find some hacks that involve creating a Google or iCloud account in the phone and then modifying the advanced settings to send the data to an arbitrary server. These techniques vary a lot between specific versions of the Windows Phone OS and so the techniques I've described below are probably easier.

Identify the Windows Live / Hotmail account

The user may not remember or realize that a Microsoft account was created when they first obtained the phone. It may have been created for them by the phone, a friend or the salesperson in the phone shop.

Look in the settings (Accounts) to find the account ID / email address. If the user hasn't been using this account, they may not recognize it and probably won't know the password for it. It is essential to try and obtain (or reset) the password before going any further, so start with the password recovery process. Microsoft may insist on sending a password reset email to some other email address that the user has previously provided or linked to their phone.

Extracting data from the phone

In many cases, the easiest way to extract the data is to download it from Microsoft rather than extracting it from the phone. Even if the user doesn't realize it, the data is probably all replicated in and so there is no further loss of privacy by logging in there to extract it.

Set up an IMAP mail client

An IMAP client will be used to download the user's emails (from the account they may never have used) and SMS.

Install Mozilla Thunderbird (IceDove on Debian), GNOME Evolution or a similar program on the user's PC.

Configure the IMAP mail client to connect to the account. Some clients, like Thunderbird, will automatically set up all the server details when you enter the account ID. For manual account setup, the details here may help.

Email backup

If the user was not using the account ID for email correspondence, there may not be a lot of mail in it. There may be some billing receipts or other things that are worth keeping though.

Create a new folder (or set of folders) in the user's preferred email account and drag and drop the messages from the Inbox to the new folder(s).

SMS backup

SMS backup can also be done through It is slightly more complicated than email backup, but similar.

  • In the Outlook email index page, look for the settings button and click Manage Categories.
  • Enable the Contacts and Photos categories with a tick in each of them.
  • Go back to the main Inbox page and look for the categories section on the bottom left-hand side of the screen, under the folder list. Click the Contacts category.
  • The page may now appear blank. That is normal.
  • On the top right-hand corner of the page, click the Arrange menu and choose Conversation.
  • All the SMS messages should now appear on the screen.
  • Under the mail folders list on the left-hand side of the page, click to create a new folder with a name like SMS.
  • Select all the SMS messages and look for the option to move them to a folder. Send them to the SMS folder you created.
  • Now use the IMAP mail client to locate the SMS folder and copy everything from there to a new folder in the user's preferred mail server or local disk.
Contacts backup

On the top left-hand corner of the email page, there is a chooser to select other applications. Select People.

You should now see a list of all the user's contacts. Look for the option to export them to Outlook and other programs. This will export them as a CSV file.

You can now import the CSV file into another application. GNOME Evolution has an import wizard with an option for Outlook file format. To load the contacts into a WebDAV address book, such as DAViCal, configure the address book in Evolution and then select it as the destination when running the CSV import wizard.

WARNING: beware of using the Mozilla Thunderbird address book with contact data from mobile devices and other sources. It can't handle more than two email addresses per contact and this can lead to silent data loss if contacts are not fully saved.

Calendar backup

Now go to the application chooser again and select the calendar application. Microsoft provides instructions to extract the calendar, summarised here:

  • Look for the Share button at the top somewhere and click it.
  • On the left-hand side of the page, click Get a link
  • On the right-hand side, choose Show event details to ensure you get a full calendar and then click Create underneath it.
  • Look for the link with a webcals prefix. If you are downloading with a tool like wget, change the scheme prefix to https. Fetch the file from this link and save it with an ics extension.
  • Inspect the ics calendar file to make sure it looks like real iCalendar data.

You can now import the ics file into another application. GNOME Evolution has an import wizard with an option for iCalendar file format. To load the calendar entries into a CalDAV server, such as DAViCal, configure the calendar server in Evolution and then select it as the destination when running the import wizard.

Backup the user's photos, videos and other data files

Hopefully you will be able to do this step without going through Try enabling the MTP or PTP mode in the phone and attach it to the computer using the USB cable. Hopefully the computer will recognize it in at least one of those modes.

Use the computer's file manager or another tool to simply backup the entire directory structure.

Reset the phone to factory defaults

Once the user has their hands on a real phone, it is likely they will never want to look at that Windows phone again. It is time to erase the Windows phone, there is no going back.

Go to the Settings and About and tap the factory reset option. It is important to do this before obliterating the account, otherwise there are scenarios where you could be locked out of the phone and unable to erase it.

Erasing may take some time. The phone will reboot and then display an animation of some gears spinning around for a few minutes and then reboot again. Wait for it to completely erase.

Permanently close the Microsoft account

Keeping track of multiple accounts and other services is tedious and frustrating for most people, especially with services that try to force the user to receive email in different places.

You can help eliminate user fatigue by helping them permanently close the account so they never have to worry about it again.

Follow the instructions on the Microsoft site.

At some point it will suggest certain actions you should take before closing the account, most can be ignored. One thing you should do is remove the link between the account ID and the phone. It is a good idea to do this as otherwise you may have problems erasing the device, if you haven't already done so. Before completely closing the account, also verify that the factory reset of the phone completed successfully.

Dispose of the Windows phone safely

If you can identify any faults with the phone, the user may be able to return it under the terms of the warranty. Some phone companies may allow the user to exchange it for something more desirable when it fails under warranty.

It may be tempting to sell the phone to a complete stranger on eBay or install a custom ROM on it. In practice, neither option may be worth the time and effort involved. You may be tempted to put it beyond use so nobody else will suffer with it, but please try to do so in a way that is respectful of the environment.

Putting the data into a new phone

Prepare the new phone with a suitable ROM such as Replicant or Cyanogenmod.

Install the F-Droid app on the new phone.

From F-droid, install the DAVdroid app. DAVdroid will allow you to quickly sync the new phone against any arbitrary CalDAV and WebDAV server to populate it with the user's calendar and contact / address book data.

Now is a good time to install other interesting apps like Lumicall, Conversations and K-9 Mail.

Categories: Elsewhere

Juliana Louback: PaperTrail - Powered by IBM Watson

Sat, 14/11/2015 - 11:06

On the final semester of my MSc program at Columbia SEAS, I was lucky enough to be able to attend a seminar course taught by Alfio Gliozzo entitled Q&A with IBM Watson. A significant part of the course is dedicated to learning how to leverage the services and resources available on the Watson Developer Cloud. This post describes the course project my team developed, the PaperTrail application.

Project Proposal

Create an application to assist in the development of future academic papers. Based on a paper’s initial proposal, Paper Trail predicts publications to be used as references or acknowledgement of prior art and provides a trend analysis of major topics and methods.

The objective is to speed the discovery of relevant papers early in the research process, and allow for early assessment of the depth of prior research concerning the initial proposal.

Meet the Team

Wesley Bruning, Software Engineer, MSc. in Computer Science

Xavier Gonzalez, Industrial Engineer, MSc. in Data Science

Juliana Louback, Software Engineer, MSc. in Computer Science

Aaron Zakem, Patent Attorney, MSc. in Computer Science

Prior Art

A significant amount of attention has been given to this topic over the past few decades. The table below shows the work the team deemed most relevant due to recency, accuracy and similarity of functionality.

The variation in accuracy displayed is a result of experimentation with different dataset sizes and algorithm variations. More information and details can be found in the prior art report.

The main differential of PaperTrail is providing a form of access to the citation prediciton and trend analysis algorithm. With the exception of the project by McNee et al., these algorithmns aren’t currently available for general use. The application on is open to use but its objective is to rank publications and authors for given topics.


Citation Prediction: PaperTrail builds on the work done by Wolski’s team in Fall 2014. This algorithmn builds a reference graph used to define research communities, with an associated vector of topic scores generated by an LDA model. The papers in each research community are then ranked by importance within the community with a custom ranking algorithm. When a target document is given to algorithm as input, the LDA model is used to generate a vector of topics that are present in the document. The communities with the most similar topic vectors are selected and the publications within these communities with highest rank and greatest similarity to the input document are recommended as references. A more detailed description can be found here.

Trend Analysis: Initially, the idea was to use the AlchemyData News API to obtain statistics pertaining to the amount of publications on a given topic over time. However, with the exception of buzz-words (i.e. ‘big data’), many more specialized topics appeared very infrequently in news articles, if at all. This isn’t entirely surprising given the target audience of PaperTrail. As a work around, we use the Alchemy Language API to extract keywords from the abstracts in the dataset, in addition to relevance scores. The PaperTrail database could then be queried for entry counts for a given year and keyword to provide an indication of publication trends in academia. Note that the Alchemy Language API extracts multiple-word ‘keywords’ as well as single words.


To maintain consistency with Wolski’s project, we are using the DBLP data as made available on The DBLP-Citation-network V5 dataset contains 1,572,277 entries; we are limited to the use of entries that contain both abstracts and citations, bringing the dataset size down to 265,865 entries.


A high-level visualization of the project architecture is displayed below. Before launching PaperTrail, it’s necessary to train Wolski’s algorithm offline. Currently any documentation with regard to the performance of said algorithm is unavailable; the PaperTrail project will include an evaluation phase and report the findings made.

The PaperTrail app and database will be hosted on the Bluemix Platform.

Status Report

Phases completed:

  • Project design

  • Prior art research

  • Data cleansing

  • Development and deployment of an alpha version of the PaperTrail app

Phases under development:

  • Algorithm training and evaluation

  • Keyword extraction

  • MapReduce of publication frequency by year and topic

  • Data visualization component

Categories: Elsewhere

Craig Small: Mixing pysnmp and stdin

Sat, 14/11/2015 - 08:04

Depending on the application, sometimes you want to have some socket operations going (such as loading a website) and have stdin being read. There are plenty of examples for this in python which usually boil down to making stdin behave like a socket and mixing it into the list of sockets select() cares about.

A while ago I asked an email list could I have pysnmp use a different socket map so I could add my own sockets in (UDP, TCP and a zmq to name a few) and the Ilya the author of pysnmp explained how pysnmp can use a foreign socket map.

This sample code below is merely an mixture of Ilya’s example code and the way stdin gets mixed into the fold.  I have also updated to the high-level pysnmp API which explains the slight differences in the calls.

  1. from time import time
  2. import sys
  3. import asyncore
  4. from pysnmp.hlapi import asyncore as snmpAC
  5. from pysnmp.carrier.asynsock.dispatch import AsynsockDispatcher
  8. class CmdlineClient(asyncore.file_dispatcher):
  9. def handle_read(self):
  10. buf = self.recv(1024)
  11. print "you said {}".format(buf)
  14. def myCallback(snmpEngine, sendRequestHandle, errorIndication,
  15. errorStatus, errorIndex, varBinds, cbCtx):
  16. print "myCallback!!"
  17. if errorIndication:
  18. print(errorIndication)
  19. return
  20. if errorStatus:
  21. print('%s at %s' % (errorStatus.prettyPrint(),
  22. errorIndex and varBinds[int(errorIndex)-1] or '?')
  23. )
  24. return
  26. for oid, val in varBinds:
  27. if val is None:
  28. print(oid.prettyPrint())
  29. else:
  30. print('%s = %s' % (oid.prettyPrint(), val.prettyPrint()))
  32. sharedSocketMap = {}
  33. transportDispatcher = AsynsockDispatcher()
  34. transportDispatcher.setSocketMap(sharedSocketMap)
  35. snmpEngine = snmpAC.SnmpEngine()
  36. snmpEngine.registerTransportDispatcher(transportDispatcher)
  37. sharedSocketMap[sys.stdin] = CmdlineClient(sys.stdin)
  39. snmpAC.getCmd(
  40. snmpEngine,
  41. snmpAC.CommunityData('public'),
  42. snmpAC.UdpTransportTarget(('', 161)),
  43. snmpAC.ContextData(),
  44. snmpAC.ObjectType(
  45. snmpAC.ObjectIdentity('SNMPv2-MIB', 'sysDescr', 0)),
  46. cbFun=myCallback)
  48. while True:
  49. asyncore.poll(timeout=0.5, map=sharedSocketMap)
  50. if transportDispatcher.jobsArePending() or transportDispatcher.transportsAreWorking():
  51. transportDispatcher.handleTimerTick(time())

Some interesting lines from the above code:

  • Lines 8-11 are the stdin class that is called (or rather its handle_read method is) when there is text available on stdin.
  • Line 34 is where pysnmp is told to use our socket map and not its inbuilt one
  • Line 37 is where we have used the socket map to say if we get input from stdin, what is the handler.
  • Lines 39-46 are sending a SNMP query using the high-level API
  • Lines 48-51 are my simple socket poller

With all this I can handle keyboard presses and network traffic, such as a simple SNMP poll.

Categories: Elsewhere

Francois Marier: How Tracking Protection works in Firefox

Fri, 13/11/2015 - 21:40

Firefox 42, which was released last week, introduced a new feature in its Private Browsing mode: tracking protection.

If you are interested in how this list is put together and then used in Firefox, this post is for you.

Safe Browsing lists

There are many possible ways to download URL lists to the browser and check against that list before loading anything. One of those is already implemented as part of our malware and phishing protection. It uses the Safe Browsing v2.2 protocol.

In a nutshell, the way that this works is that each URL on the block list is hashed (using SHA-256) and then that list of hashes is downloaded by Firefox and stored into a data structure on disk:

  • ~/.cache/mozilla/firefox/XXXX/safebrowsing/mozstd-track* on Linux
  • ~/Library/Caches/Firefox/Profiles/XXXX/safebrowsing/mozstd-track* on Mac
  • C:\Users\XXXX\AppData\Local\mozilla\firefox\profiles\XXXX\safebrowsing\mozstd-track* on Windows

This sbdbdump script can be used to extract the hashes contained in these files and will output something like this:

$ ~/sbdbdump/ -v . - Reading sbstore: mozstd-track-digest256 [mozstd-track-digest256] magic 1231AF3B Version 3 NumAddChunk: 1 NumSubChunk: 0 NumAddPrefix: 0 NumSubPrefix: 0 NumAddComplete: 1696 NumSubComplete: 0 [mozstd-track-digest256] AddChunks: 1445465225 [mozstd-track-digest256] SubChunks: ... [mozstd-track-digest256] addComplete[chunk:1445465225] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5 ... [mozstd-track-digest256] MD5: 81a8becb0903de19351427b24921a772

The name of the blocklist being dumped here (mozstd-track-digest256) is set in the urlclassifier.trackingTable preference which you can find in about:config. The most important part of the output shown above is the addComplete line which contains a hash that we will see again in a later section.

List lookups

Once it's time to load a resource, Firefox hashes the URL, as well as a few variations of it, and then looks for it in the local lists.

If there's no match, then the load proceeds. If there's a match, then we do an additional check against a pairwise allowlist.

The pairwise allowlist (hardcoded in the urlclassifier.trackingWhitelistTable pref) is designed to encode what we call "entity relationships". The list groups related domains together for the purpose of checking whether a load is first or third party (e.g. and both belong to the same entity).

Entries on this list (named mozstd-trackwhite-digest256) look like this:

which translates to "if you're on the site, then don't block resources from

If there's a match on the second list, we don't block the load. It's only when we get a match on the first list and not the second one that we go ahead and cancel the network load.

If you visit our test page, you will see tracking protection in action with a shield icon in the URL bar. Opening the developer tool console will expose the URL of the resource that was blocked:

The resource at "" was blocked because tracking protection is enabled.

Creating the lists

The blocklist is created by Disconnect according to their definition of tracking.

The Disconnect list is on their Github page, but the copy we use in Firefox is the copy we have in our own repository. Similarly the Disconnect entity list is from here but our copy is in our repository. Should you wish to be notified of any changes to the lists, you can simply subscribe to this Atom feed.

To convert this JSON-formatted list into the binary format needed by the Safe Browsing code, we run a custom list generation script whenever the list changes on GitHub.

If you run that script locally using the same configuration as our server stack, you can see the conversion from the original list to the binary hashes.

Here's a sample entry from the mozstd-track-digest256.log file:

[m] >> [canonicalized] [hash] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5

and one from mozstd-trackwhite-digest256.log:

[entity] Twitter >> (canonicalized), hash a8e9e3456f46dbe49551c7da3860f64393d8f9d96f42b5ae86927722467577df

This in combination with the sbdbdump script mentioned earlier, will allow you to audit the contents of the local lists.

Serving the lists

The way that the binary lists are served to Firefox is through a custom server component written by Mozilla: shavar.

Every hour, Firefox requests updates from If new data is available, then the whole list is downloaded again. Otherwise, all it receives in return is an empty 204 response.

Should you want to play with it and run your own server, follow the installation instructions and then go into about:config to change these preferences to point to your own instance:

browser.trackingprotection.gethashURL browser.trackingprotection.updateURL

Note that on Firefox 43 and later, these prefs have been renamed to:

browser.safebrowsing.provider.mozilla.gethashURL browser.safebrowsing.provider.mozilla.updateURL Learn more

If you want to learn more about how tracking protection works in Firefox, you can find all of the technical details on the Mozilla wiki or you can ask questions on our mailing list.

Thanks to Tanvi Vyas for reviewing a draft of this post.

Categories: Elsewhere

John Goerzen: Memories of a printer

Fri, 13/11/2015 - 19:42

I have a friend who hates printers. I’ll call him “Mark”, because that, incidentally, is his name. His hatred for printers is partly my fault, but that is, ahem, a story for another time that involves him returning from a battle with a printer with a combination of weld dust, toner, and a deep scowl on his face.

I also tend to hate printers. Driver issues, crinkled paper, toner spilling all over the place…. everybody hates printers.

But there is exactly one printer that I have never hated. It’s almost 20 years old, and has some stories to tell.

Nearly 20 years ago, I was about to move out of my parents’ house, and I needed a printer. I bought a LaserJet 6MP. This printer ought to have been made by Nokia. It’s still running fine, 18 years later. It turned out to be one of the best investments in computing equipment I’ve ever made. Its operating costs, by now, are cheaper than just about any printer you can buy today — less than one cent per page. It has been supported by every major operating system for years.

PostScript was important, because back then running Ghostscript to convert to PCL was both slow and a little error-prone. PostScript meant I didn’t need a finicky lpr/lprng driver on my Linux workstation to print. It just… printed. (Hat tip to anyone else that remembers the trial and error of constructing an /etc/printcap that would print both ASCII and PostScript files correctly!)

Out of this printer have come plane and train tickets, taking me across the country to visit family and across the world to visit friends. It’s printed resumes and recipes, music and university papers. I even printed wedding invitations and envelopes on them two years ago, painstakingly typeset in LaTeX and TeXmacs. I remember standing at the printer in the basement one evening, feeding envelope after envelope into the manual feed slot. (OK, so it did choke on a couple of envelopes, but overall it all worked great.)

The problem, though, is that it needs a parallel port. I haven’t had a PC with one of those in a long while. A few years ago, in a moment of foresight, I bought a little converter box that has an Ethernet port and a parallel port, with the idea that it would be pay for itself by letting me not maintain some old PC just to print. Well, it did, but now the converter box is dying! And they don’t make them anymore. So I finally threw in the towel and bought a new LaserJet.

It cost a third of what the 6MP did, has a copier, scanner, prints in color, does duplexing, has wifi… and, yes, still supports PostScript — strangely enough, a deciding factor in going with HP over Brother once again. (The other was image quality)

We shall see if I am still using it when I’m 50.

Categories: Elsewhere

Daniel Pocock: How much video RAM for a 4k monitor?

Fri, 13/11/2015 - 18:49

I previously wrote about my experience moving to a 4K monitor.

I've been relatively happy with it except for one thing: I found that 1GB video RAM simply isn't sufficient for a stable system. This wasn't immediately obvious as it appeared to work in the beginning, but over time I've observed that it was not sufficient.

I'm not using it for gaming or 3D rendering. My typical desktop involves several virtual workspaces with terminal windows, web browsers, a mail client, IRC and Eclipse. Sometimes I use vlc for playing media files.

Using the nvidia-settings tool, I observed that the Used Dedicated memory statistic would frequently reach the maximum, 1024MB. On a few occasions, X crashed with errors indicating it was out of memory.

After observing these problems, I put another card with 4GB video RAM into the system and I've observed it using between 1024 MB and 1300 MB at any one time. This leaves me feeling that people with only basic expectations for their desktop should aim for at least 2GB video RAM for 4k.

That said, I've continued to enjoy many benefits of computing with a 4K monitor. In addition to those mentioned in my previous blog, here are some things that were easier for me with 4K:

  • Using gitk to look through many commits on the master branch of reSIProcate and cherry-pick some things to the resiprocate-1.9 branch. gitk only used half the screen and I was able to use the right hand side of the screen to look at the code in an editor in more detail.
  • Simultaneously monitoring logs from two Android devices running Lumicall and a repro SIP proxy server in three terminal windows arranged side by side, up to 125 lines of text in each.
  • Using WebRTC sites in the Mozilla browser while having a browser console window, source code and SIP proxy logs all open at the same time, none of them overlapping.

You can do much of this with a pair of monitors, but there is something quite nice about doing it all on a single 4K screen.

Categories: Elsewhere

Daniel Pocock: Building teams around SIP and XMPP in Debian and Fedora

Fri, 13/11/2015 - 12:52

I've recently started a discussion on the Fedora devel mailing list about building a team to collaborate on RTC services (SIP, XMPP, TURN and WebRTC) for the Fedora community. We already started a similar team within Debian.

This isn't only for developers or package maintainers and virtually anybody with a keen interest in free software can help. Testing different softphones and putting screenshots on the wiki can help a lot (the Debian wiki already provides some examples). The site is not intended to be an advertisement for my web design skills and anybody with expertise in design would be very welcome to contribute.

Teamwork in this endeavor can provide many benefits:

  • Sharing knowledge about RTC, for use within our communities and also for other communities using the free and open technology
  • Engaging with collaborators who are not involved in packaging teams, for example, the Debian RTC team has also had interest from upstream developers who are not on other Debian or Fedora mailing lists
  • Minimizing the effort required by the system administrators (the DSA team in Debian or Infrastructure team in Fedora) by triaging user problems and planning and testing any proposed changes.
  • Freeing up developer time to work on new features, such as the exciting work I'm doing on telepathy-resiprocate.

There are also many opportunities for project work that go beyond traditional packaging responsibilities. Wouldn't it be interesting to find ways to integrate the publish/subscribe capabilities of SIP and XMPP with the Fedmsg infrastructure?

Bringing XMPP to

We recently launched XMPP for and it would not be hard to replicate for users. Sure, some people are happy running their own XMPP servers. There are just as many people who prefer to focus on development and have something like XMPP provided for them.

With the strong emphasis on building a roster/buddy-list, XMPP can also help to facilitate long-term engagement in the community and users may identify more closely with the project.

I haven't offered XMPP on the trial service because it would be inconvenient for people to migrate buddy lists to the domain when the service is officially adopted.

Collaboration across communities

There are various other places where we can share knowledge between teams in different communities and people are invited to participate.

The Free-RTC mailing list is a great place to discuss free RTC strategies and initiatives.

The XMPP operators mailing list provides a forum to discuss operational issues in the XMPP space, such as keeping out the spammers.

Would you like to participate?

Please consider joining some of the mailing lists I've mentioned, replying to the thread on the Fedora devel mailing list, volunteering for the Debian RTC team or emailing me personally.

Categories: Elsewhere

Rapha&#235;l Hertzog: Freexian’s report about Debian Long Term Support, October 2015

Fri, 13/11/2015 - 11:03

Like each month, here comes a report about the work of paid contributors to Debian LTS.

Individual reports

In September, 85.50 work hours have been dispatched among 8 paid contributors. Their reports are available:

  • Ben Hutchings did 14 hours (13.5h allocated, thus only catching up 0.5 hours out of the 5.5 extra hours he had left from former month).
  • Chris Lamb did 11 hours (12h allocated, he will catch up later).
  • Guido Günther did 4 hours (out of 8 hours allocated, thus keeping 4 extra hours for November).
  • Mike Gabriel did nothing (out of 8 hours allocated, he will catch up in November).
  • Raphaël Hertzog did 13.25 hours.
  • Santiago Ruano Rincón did 13.5 hours.
  • Scott Kitterman did 8 hours (4 hours allocated and 4 hours remaining from September)
  • Thorsten Alteholz did 13.25 hours.
Evolution of the situation

November crossed a new record with 114.5 hours funded. This is mainly thanks to our first Platinum sponsor: TOSHIBA (through Toshiba Software Development Vietnam). They don’t know yet if they can sponsor us in the long term (they hope so), but it’s still a nice news as we jumped from 50% to 65% of the objective of the equivalent of a full-time position with a single new sponsor.

Currently no change is expected for next month as we don’t have any other new sponsor in the process of joining us.

We still need more support to be able to support all the packages we could not afford to support during the squeeze cycle. We are currently discussing which package we can or cannot support on the LTS list, see the thread Unsupported packages for Wheezy LTS for the current situation.

In terms of security updates waiting to be handled, the situation is close to last month: the dla-needed.txt file lists 21 packages awaiting an update (6 more than last month), the list of open vulnerabilities in Squeeze shows about 23 affected packages in total (exactly like last month).

Thanks to our sponsors

The new sponsors are in bold.

No comment | Liked this article? Click here. | My blog is Flattr-enabled.

Categories: Elsewhere

Jaldhar Vyas: New Year Resolutions

Fri, 13/11/2015 - 07:02

Now this could just be the prodigious amounts of sugar and ghee I've been eating the past few days, but I'm bursting with energy for working on Debian so here is a list of things I want to do in the upcoming year.

  • Finish packaging the latest Dovecot (2.2.19).

This will happen soon. Sooner if you people will stop filing bugs all the time. The rest of the items are more long term.

  • Recruit more members for the Dovecot packaging team.
  • Undertake to respond to all bug reports and other Debian correspondence within three days.
  • Finally get the Minix port to a state where other people can use it and work on it too.
  • Relearn Debian development from scratch so that I can be familliar with and apply the latest techniques. Write down what I've learned.
  • Arrange (or more realistically get someone to arrange) a Debian event in Northern or Eastern India.
  • Watch all the Debconf 15 videos.

To keep all of this on track, I shall adopt the practice lately seen on Debian Planet of periodically posting a report of my Debian activities.

Categories: Elsewhere

Carl Chenet: db2twitter 0.2 released

Thu, 12/11/2015 - 19:00

db2twitter 0.2 was just released! If you missed my last post about it,  db2twitter automatically extracts fields from your database, use them to feed a template of tweet and send the tweet.

db2twitter is developed by and run for, the job board of th french-speaking Free Software and Opensource community.

The main new feature allows to send tweets only during authorized days/hours with the following content in your db2twitter.ini file :

[timer] days=mon-fri, hours=0-11,14-17,
  • The line days=mon-fri specifies db2twitter can send tweets only from monday to friday.
  • The line hours=0-11,14-17 specifies db2twitter can send tweets from midnight to 11AM (included) and from 2PM to 17PM (included)

Given the fact db2twitter will mostly be launched via cron, it allows sending tweets detected since the last db2twitter run during user-defined hours. And that’s cool!

db2twitter is coded in Python 3.4, uses SQlAlchemy (see supported database types) and  Tweepy. The official documentation is available on readthedocs.

Categories: Elsewhere

Lucy Wayland: Differences bring us together

Wed, 11/11/2015 - 22:40

On the 13th of May this year, I legally became Lucy Wayland. I’d been living as a woman full time a couple of months before that, but that is when two dear friends witnessed my name change. I am going to post about the whole experience when it is finally into the completion zone.

However, this last weekend just gone, I was helping out with the Cambridge (UK) MiniDebConf. I was mostly gophering and front-desk-helpering, with side orders of beverages, so I missed most of the talks. Which is not the point.

I met nearly everybody at the conference. Many of them knew me as Jon, a goateed man. I was there as Lucy, a woman. And nobody batted an eyelid.

  • Not a single person used my old name
  • Not a single person mis-gendered me
  • Not a single person referred to my transition

The only time I had to produced my Deed Poll out was for keysigning, as I still do not have photo ID with my new name on. I proffered it along with my passport, so there was no embarrassment.

I know other people within Debian have gone through the same process. However, I just have to say how wonderful it is, to be accepted just that way.

And hence the title of my article. Our differences bring us together. So many different people from so many different cultures came together, wanted to created, and my change of gender was just irrelevant.

And that’s how it should be.

Categories: Elsewhere

Bits from Debian: New Debian Developers and Maintainers (September and October 2015)

Wed, 11/11/2015 - 22:35

The following contributors got their Debian Developer accounts in the last two months:

  • ChangZhuo Chen (czchen)
  • Eugene Zhukov (eugene)
  • Hugo Lefeuvre (hle)
  • Milan Kupcevic (milan)
  • Timo Weingärtner (tiwe)
  • Uwe Kleine-König (ukleinek)
  • Bernhard Schmidt (berni)
  • Stein Magnus Jodal (jodal)
  • Prach Pongpanich (prach)
  • Markus Koschany (apo)
  • Andy Simpkins (rattustrattus)

The following contributors were added as Debian Maintainers in the last two months:

  • Miguel A. Colón Vélez
  • Afif Elghraoui
  • Bastien Roucariès
  • Carsten Schoenert
  • Tomasz Nitecki
  • Christoph Ulrich Scholler
  • Mechtilde Stehmann
  • Alexandre Viau
  • Daniele Tricoli
  • Russell Sim
  • Benda Xu
  • Andrew Kelley
  • Ivan Udovichenko
  • Shih-Yuan Lee
  • Edward Betts
  • Punit Agrawal
  • Andreas Boll
  • Dave Hibberd
  • Alexandre Detiste
  • Marcio de Souza Oliveira
  • Andrew Ayer
  • Alf Gaida


Categories: Elsewhere